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Self-Acceptance and Adjustment 
Charles Taylor and Arthur W. Combs 


School of Education, Syracuse University 


In recent years a great deal of attention has 
been given to the interpretation of behavior in 
terms of self theory. Rogers [4], for example, 
has defined the well-adjusted individual as one 
able to accept all perceptions, including those 
about self, into his personality organization. 
He describes the situation as follows: “It 
would appear that when all of the ways in 
which the individual perceives himself—all 
perceptions of the qualities, abilities, impulses, 
and attitudes of the person, and all percep- 
tions of himself in relation to others—are ac- 
cepted into the organized conscious concept of 
the self, then this achievment is accompanied 
by feelings of comfort and freedom from ten- 
sion which are experienced as psychological ad- 
justment” [4, p. 364]. He points out that this 
relationship between self-acceptance and ad- 
justment is a commonly observed phenomenon 
in client-centered therapy and seems to increase 
in the client as therapy progresses and adjust- 
ment improves. 

Combs [1] and Snygg and Combs [5], a- 
dapting Rogers’ definition to their phenomen- 
ological interpretation, have described the well- 
adjusted individual in terms of the adequacy 
of the self organization. They define the ad- 
equate self as follows: “A phenomenal self is 
adequate in the degree to which it is capable 
of accepting into its organization any and all 
aspects of reality” [5, p. 136]. They point out 
that the individual who feels inadequate to 
deal with his perceptions of reality, feels 
threatened by such perceptions and is likely 
to reject or distort them. The maladjusted 
person, in phenomenological terms, is charac- 
terized by many threatening perceptions, and 
his maladjusted behavior occurs largely as a re- 
sult of his attempts to deal with the threats 
to which he feels himself subjected. In this 
sense a maladjusted person is synonymous with 
a threatened one. 
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Murphy [3] and Lecky [2] have taken 
similar positions with respect to the problem 
of adjustment. If the position taken by these 
authors is accurate, it should be possible to 
demonstrate a close relationship between self- 
acceptance and external measures of adjust- 
ment. This is the problem we have sought to 
investigate in this study. 


The Rationale of this Experiment 


In line with the theoretical positions out 
that, if 
ire accurate, the well-adjusted in 


lined above, it seemed to us those 
positions 
dividual ought to be better able to accept more 
unflattering (and hence threatening) facts 
about himself than would be expected of the 
less well-adjusted individual. We, therefore, 
stated our problem as follows: two 
groups of children, one better adjusted than 
the other by some external criterion; we pre 
dict that the better-adjusted children will be 
able to accept more damaging statements about 


themselves than will the poorer-adjusted chil- 


} 
‘ 
‘ 


(siven 


The Experimental Desig: 


In selecting a population for our study we 
sought children of approximately similar socio 
economic condition, educational level, and age. 
Accordingly, we administered our two instru- 
ments to all sixth-grade children, a population 
of 205, in a group of consolidated rural schools 
in northeastern Pennsylvania in the spring 
of 1949. 

As a rough external measure of adjustment 
we used the California Test of Personality 
Elementary Form A. While this instrument 
is admittedly not a refined clinical instrument 
for distinguishing between adjusted and mal 
adjusted children, it served our purpose as a 
rough screening device that was familiar to 
the teachers and simple to administer and score. 
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On the basis of this test we divided our sub- 
jects into the upper and lower 50 per cent in 
terms of adjustment score obtained on the 
CTP. In this way we arrived at two groups 
of children, one group better adjusted than 
the other on the basis of an external criterion. 

To test the degree to which these children 
would accept damaging statements about them- 
selves, we prepared a list of statements that 
seemed to us to be “probably true” of all chil- 
dren, yet damaging to self if admitted. In the 
construction of this list we solicited the aid of 
graduate students and faculty in suggesting 
items and criticizing items in our list. As a 
result of this informal consensus, we finally 
agreed on twenty statements that seemed likely 
to be true of all children. They were the fol- 
lowine 


1. I sometimes disobey my parents. 

2. I sometimes say bad words or swear. 

3. I sometimes copy or cheat on school work 

4. I sometimes am rude to older people. 

5. I sometimes tell lies. 

6. I sometimes make fun of other schoolmates 

7. | sometimes pretend to forget things I am 
supposed to do. 

8. I sometimes steal things when I know I will 


10 


11, 


not be caught. 


. I sometimes 


I sometimes 
things. 
I sometimes 


fib to my classmates. 
pretend to be sick to get out of 


am unkind to younger children. 





12. I sometimes am lazy and won’t do my work. 


13. I sometimes tell dirty stories. 

14. I sometimes cheat in games. 

15. I sometimes am unruly at school. 
16. I 


sometimes do not brush my teeth on pur- 
pose. 

17. I sometimes talk back to my mother. 

18. I sometimes am mean to animals. 

19. I sometimes waste my time when I should be 
working. 

20. I sometimes show off in front of other chil- 
dren. 


This list was mimeographed and presented 
to all of the children of the sixth grades who 
were in school on the day of the administra- 
tion two weeks after completing the CTP. 
They were told that this was a list of things 
that boys and girls sometimes did and that we 
were interested in finding out which of these 
were true of the sixth graders in this school. 
They were told further that we did not want 
to know which children did these things, but 
only about the group as a whole, and for this 
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reason we did not want them to put their 
names on the papers. Instead, we asked them 
to check those statements true for them, fold 
the paper, and hand it in. 

By means of an obscured code it was pos- 
sible for us to identify the person who mar- 
ked each paper in spite of the fact that no 
names were placed on the paper by the child- 
ren themselves. In this way we obtained 
CTP’s and our damaging statements list for 
105 boys and 75 girls of the original group. 
We felt somewhat guilty about betraying these 
children but concluded that, perhaps, for the 
sake of investigation, it could be excused. 


Results of the Experiment 


Table 1 presents a summary of the results 
of our study. It seems quite clear that the 
differences shown in the table below are not 
likely to be due to chance alone. It would ap- 
pear that our prediction in this study was amply 


‘Table 1 


Mean Adjustment Scores, Mean Number of Dam- 
aging Items Checked and Critical Ratios of 





Damaging 
Mean Items Mean SE/ 
CTP Checked S.D. Mean CR 


Boys 
Lower 50% 33.4 5.1 2.5 35 
5.98 
Upper 50% 68.7 9.2 4.2 59 
Girls 
Lower 50% 36.5 5.8 2.9 48 
3.78 
Upper 50% 69.1 8.4 3.2 53 





corroborated. Apparently the relationship be- 
tween ability to accept damaging statements 
about self and adjustment is a real one and 
can be experimentally demonstrated. 

The general tendency of the results shown 
in the statistical presentation (Table 1) was 
also observed in our examination of some of 
the individual cases where children had re- 
ceived very high or very low scores on the 
CTP. One boy, for example, who obtained 
the highest adjustment score in the group, 
marked all but one of the items on our damag- 
ing statement list as true of himself. The 
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opposite tendency was true for cases of chil- 
dren with very low adjustment ratings. In- 
deed, it seems clear that the above results 
would be very much magnified by taking the 
upper and lower 25 per cent groups rather 
than the upper and lower 50 per cent. 

One interesting idea that occurred to us in 
connection with these results, is this: Most 
personality inventories are based upon the in- 
dividual’s marking or accepting statements 
about himself. While some of these statements 
may not appear to be damaging from his point 
of view, many of them may. If the results of 
this study are accurate we should then expect 
not the poorly-adjusted but the well-adjusted 
person to mark more damaging statements as 
true for himself. Indeed, that could even have 
occurred in the personality test used in this ex- 
periment. It would appear that here is a fruit- 
ful source of further experiment in which the 
hypothesis of this study might be investigated 
with some other measure of adjustment than 
a test based on verbal statements about self. 


If we may hazard a guess in this regard, 
it seems to us that, although this may be a 
serious source of error in paper and pencil per- 
sonality tests, it is probably not a completely 
invalidating factor. Whether an item is re- 
regarded as damaging by an individual is, after 
all, a matter of his own interpretation. It is 
probable that many items that would be re- 
garded by a psychologist as derogatory, are 
not so regarded by the subject who perceives 
them purely as a “matter of fact.” Neverthe- 
less, it seems possible that greater attention to 
the problem of how a statement looks to the 
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subject, as well as to its ability to discrimin- 
ate adjustment from an external criterion, 
might result in better personality instruments. 


Summary 


lhis study is an attempt to demonstrate a 
relationship between ability to accept threaten- 
ing statements about self and adjustment. It 
was predicted that better-adjusted children, as 
determined by a commonly used test of person 
ality, would be able to accept more damaging 
statements about themselves than would less 
well-adjusted children. Sixth-grade children 
were divided into better-adjusted and poorer 
adjusted groups on the basis of scores on the 
California Test of Personality. Both groups 
were then asked to check on a list of twenty 
somewhat derogatory statements those true of 
themselves. The bette r-adjusted group ¢ hecked 
significantly more items than did the poorer 
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An Empirical Study of the Concept of 
Psychotherapeutic Success ' 


Dorothy Clifton Conrad 


Veterans Administration Mental Hygiene Clinic, San Francisco 


What are the implicit factors in a therapist's 
statement that a given patient has been highly 
successful in psychotherapy, or has had only 
low success? What recognized and unrecog- 
nized criteria is he applying? Judgments of 
this type are familiar and necessary in the prac- 
tice of psychotherapy. It is, indeed, common 
practice to utilize a therapist’s judgment as a 
criterion against which other measures of suc- 
cess may be validated. For this reason, we 
set out to explore empirically the concept of 
psychotherapeutic success. 

A definition of individual psychotherapy 
should provide the framework within which 
our study will be meaningful. We shall use 
the word in the sense of a more or less en- 
during process of interpersonal relationship 
involving two individuals as its foci, and hav- 
ing as its aim the production of changes in the 
pattern of living of one of the persons (here 
designated as “the patient”) through some 
form of interaction with a specially trained 
person (here designated as “the therapist”). 

Psychotherapy is a mutual experience, char- 
acterized primarily by modifiable feeling states. 
Whatever report either therapist or patient 
makes in regard to the success of therapy must, 
perforce, be interpreted as evidence of the feel- 
ings of the individual within the relationship: 
e.g., when a therapist says, “Mr. Elson was 
the least successful of all my patients,” it is 
tantamount to his saying, “I feel that the re- 
lationship between Mr. Elson and me led 
to very little change in his pattern of living, 
as compared with my other patients.” 


IReviewed in the Veterans Administration and 
published with the approval of the Chief Medical 
Director. The statements and conclusions published 
by the author are the result of her own study and 
do not necessarily reflect the opinion or policy of 
the Veterans Administration. 


Method 


Because the study was frankly exploratory, 
and because much of the data was retrospec- 
tive, statistical procedures for determining the 
significance of differences were not employed. 
Emphasis was placed on the terms in which 
one group of psychotherapists defines success. 
These terms are the words embodying the con- 
cepts by which these particular therapists dif- 
ferentiate patients into those of most and those 
of least success, or High and Low. The find- 
ings were not clear-cut, but were neverthe- 
less sufficient to formulate reasonable hypo- 
theses for further, more extensive investiga- 
tions. 

Since even “short-term” therapy is likely 
to extend over several months, the problem of 
building up a sample of cases of differing de- 
grees of success is so difficult that it seemed 
wise to begin the investigation by using pa- 
tients already in therapy. Twenty-five pa- 
tients were designated by the experimenter, 
on the basis of ratings and rankings of their 
respective therapists, as those considered most 
successful in therapy. They are referred to 
as the High group. A contrast group was 
designated from among those considered least 
successful, and is known as the Low group. 

There was no overlapping of the two groups 
in the judgments of the respective therapists. 
Therapists tended to attribute only moderate 
success to the High group, but looked upon 
the Low group as representing an extreme of 
the success continuum. 


Results 


Table 1 shows the tabulation of the thera- 
pists’ spontaneously produced statements of the 
criteria on which they based their judgments 
of High or Low therapeutic success for each 
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Table 1 Table 2 
Therapists’ Criteria for Evaluation of Summary of Responses to Questionnaire 
Succes? in Therapy Se = _ 
tence 5 Aan High Low 
Frequency of Mention Pre- Post- Pre- Post 
High Low ther- ther- ther- ther 
Success Success Potal 


: : apy apy ipy apy 
No. % No. %&% No. &% ——__----—— - 
ees = — I Patient’s Perception of 
=... « - . I 
A. The Therapy Situation Pherapist 
I Relationship 8 10 15 18 10 
II Insight 9 g 4 6 13 7 A Impersonal object 0 0 ( 0 
III Feeling-Affect 8 7 4 6 12 7 Al Has status 1 l ( 1 
IV Intellect 3 3 0 90 3 2 A2 Lacks status 1 f 0 ! 
4 Swati 9 C 
\ Motiv ation 3 3 16 26 1 ; oe B Person who can dispense 
Subtotal A 31 28 34 «52 65 37 
help and/or love ¢ i i ! 
B. The Pattern of B1 Person who does 
, a - dispense 2 2 I 
V athology 23 21 25 38 49 o4 
2 so 0 ine 0 > > 
VII Social B2 Person who withholds 2 : g 
Conformity 21 19 a a wa C Person with whom one 
VIII Positive Mental has a feeling 
Health 36 33 l 1 37s 21 relationship 1 ( 
Subtotal B 80 72 3348 «2112 63 C1 Source or object of fear 
Grand Total 111 100 66 100 177 100 or hostilitv—negative 
Mean (Total) 4.4 2.6 3.5 affect 13 f t 
cas C2 Source of therapeutic 
° y ° ° ° Ps help -nositive a ee { ] 
patient. No limitations of any kind were set os Gite , 
for these statements. Therapists were free to * ’ 
: ; Saat ‘ ished II Therapist’s Perception of 
use as many or as few criteria as they wished, Patient 
° ° ° vr atie 
barring failure to give any. The research 
worker was not present at the time the state- A oe nothing from 
; . . ° inic ] 1 2 3 
ments were being written. Discussion among S ‘Wasetehecion 
. . . ants to m4 } 
the therapists was at a minimum. something 7 24 . 
Iherapists made more statements about the B1 Wants non-psycho- 
High group than about the Low. For the therapeutic help 0 0 2 
High group, more attention was given to the B2 Wants ego-support 7 é 7 4 


C Wants to share 
something 
C1 Wants object for 


pattern of living [1] than to the therapy situ- 
ation. For the Low group, these two cate- 
gories received approximately equal frequency 


tw 


: ~— hostile feelings 2 | 1 
of mention. The most marked distinctions C2 Wants person with 
were in the emphasis on inadequate motiva- whom he can release 
tion of the Low group and on the presence of and integrate all 

2 . feelincs 4 é f 0 
positive mental health for the High group. > © ~ , : , ' 

re . ° ° ints to he independent 

The questionnaire was presented in the form a indeed P : ; ‘ 
of incomplete sentences. Two answers were ie sittin > , > 
called for: one to describe the patient as he ITI and IV (Combined 

al i 0 1 
came to therapy and the other to refer to the Avene of Inchahe 
most recent contact. Table 2 summarizes the . 1 
4 . f nterpersona rela 

findings for all questions. tionships 4 30 (2s 

Q. 1. “This patient saw the therapist as a per B Determining role of in- 
son who....” dividual in his own 

Results from this question were very clear-cut. problems + t 3 g 
The High group most frequently began therapy ‘ Individual as a psycho ; 
with much negative affect. They seemed to the SOmBatIC UNITY Ib 10 16 12 


therapists to be characteristically afraid or angry D Other answer : Snail ; 








V Ways of Handling Feelings 


Table 2 (Continued ) 
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Summary of Responses to Questionnaire 
High Low 
Pre- Post- Pre- Post- 
ther- ther- ther- ther- 
apy apy apy apy 





A Acting out 12 
B Repression 8 
C Somatization 6 
D Withdrawal 5 
E Fantasy 4 
F Projection 4 
G Dissociation 3 
H Depression 3 
I Isolation 2 
J Verbalization 1 
K_ Rationalization 0 
L_ Sublimation 1 
M Free expression 1 
N Denial 1 
M 2.0 
VI Use of Intellect 
A For defense 14 
Not for defense 3 
B For insight seeking 3 
Not for insight seeking 1 
C Self-advancement 2 
Not for self- 
advancement 2 
D Other 0 
VII _ Integration of Feeling 
and Affect 
A None-minimal-confused 7 
B_ Poor 12 
C Adequate 1 
D Good a 
E Maximum 0 
F Other 1 
VIII Motivation 
A_ Need to change 35 
B_ Need to maintain 
status quo 1 
C Strongor weak 3 
D Other answer 1 


IX Pathology 


9aOBp 


Psychotic mechanisms 6 
Somatic complaints 13 
Sexual problems 
Conflict directed toward 
others 5 
1. Dependent 7 
2. Hostile 19 
Conflict directed toward 
self 20 
1. Depression 12 
M 3.4 


Ae wrAFerAK NON KK KO 


_ 


N oO 


-_ 
om 2 ON N 


25 


— 


wen vw 


10 


10 
3 
1.7 


10 


we 


— 
moo coc aA wo 


22 
5 
2.7 


— 


we ooN 


21 


2.7 





Table 2 (Continued ) 


Summary of Responses to Questionnaire 





High Low 

Pre- Post- Pre- Post- 

ther- ther- ther- ther- 

apy apy apy apy 

X Social Conformity 
A Women and mar- 

riage 17 14 7 83 
B Job—vocation 13 12 6 6 
C Other person 10 «13 5 5 
D_ Self-assertion 4 6 6 3 
E Self-abasement 3 1 4 2 
F Personal care 3 3 5 4 
G Material gains 0 3 5 4 
H Other answer 2 0 2 2 
M 2.1 2.1 1.4 1.2 


XI Positive Mental Health 
A §Intrapsychic func- 
tioning is 2 8 9 
B_ Personal relationships 


Secondarily, they wished to be dispensed something 
by the therapist. Conversely, the Low group ap- 
peared characteristically dependent and wishing to 
be given to, and, secondarily, they were involved in 
fear or anger. At the later phase, a comparable 
number of the High Group were involved in a pos- 
itive affective relationship with the therapist. The 
Low group had apparently learned that these ther- 
apists would not dispense help. They now saw them 
as depriving persons and became angry. It seems 
a reasonable hypothesis that the patient who begins 
treatment with a recognized emotional involvement 
finds the therapist ready to enter the relationship 
and, furthermore, capable of rechannelizing the 
available feeling. The demanding, dependent pa- 
tient, on the other hand, offers nothing of himself 
and expects the therapist to make the full effort to 
meet his demands. This the therapist is unwilling 
to do; the patient is frustrated and fails to develop 
into the sort of patient whom the therapist consid- 
ers successful. 


G2 “a 


wanted .... 


saw this patient as who 


” 


a person 


Twenty of the Low group were seen by their 
therapists as having come to the Clinic primarily 
wanting to be given something. Most of these left 
with the same point of view, although some had by 
then wanted nothing at all from the Clinic. The 
wish to be given something was also characteris- 
tic of the High group, both at the beginning and 
at the end, but not to such an extreme degree. A 
substantial number of these were thought to be in- 
terested in expressing other feelings, something al- 
most entirely absent among the Lows. 
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Q. 3. “This patient failed to show insight 
Re wise” 


Q. 4. “This patient showed . insight in 
to a 


Therapists seemed most concerned that patients 
should have some insight into the meaning of their 
own interpersonal relations and that they should 
recognize symptoms as such. Estimates of the de- 
gree of insight were extremely modest, as indicated 
by the terms inserted in the blank space in Ques- 
tion 4. Characteristically, they noted that the pa- 
tient both failed to show and did show some in- 
sight into a particular aspect of his personality. 
The cases in which no insight whatsoever was 
observed were all in the Low group. The most 
marked change in the pretherapy and posttherapy 
ratings appears for the High group which had six 
patients listed as having some insight into the 
meaning of their own interpersonal relations at the 
beginning and twelve so rated at the end. 


Q. 5. “This patient handled his feelings by 

Responses to this question were typically in terms 
of the so-called defense mechanisms. More than one 
was usually mentioned for each patient. The larg- 
est frequency for both groups occurred in acting 
out at the beginning (12 Highs and 10 Lows). Al- 
though there was no difference in the two at the 
beginning, at the later rating, the Highs had a fre- 
quency of only 6, whereas that for the Lows was 
still 10. Repression was seen as significant for the 
Highs at the beginning, but negligible for both 
groups otherwise. 


Verbalization, rationalization, sublimation, and 
free expression characterized the High group at the 
posttherapy rating, in distinct contrast to their own 
pretherapy rating and to both ratings for the Lows. 
Presumably, these ways of behaving are accepted 
implicitly by the therapists as therapeutic goals. 
Fantasy and projection were the two mechanisms 
much more frequent among the Lows at the post- 
therapy rating than among the Highs. 

The average number of mechanisms listed at the 
beginning was 2.0 for the Highs and 1.6 for the 
Lows. On the posttherapy ratings, the Highs had 
1.6 and the Lows 1.3; iie., the Highs exceeded the 
Lows in number of defense mechanisms throughout. 


Q. 6. “This patient used/did not use his intel- 
lect to . . 


Therapists much more often chose to refer to the 
patient’s use of intellect rather than to his failure 
to use it. “Did not use” was not chosen at all for 
the posttherapy ratings of the Highs. Use of the in- 
tellect for defensive purposes was important for 
both groups at both ratings. The greatest change 
was noted for the Highs, where the tallies for use 
of intellect to seek insight increased from 3 to 11. 


Q. 7. “This patient’s integration of feeling and 
meet... a” 


This question was the one about which the ther- 
apists reported most dissatisfaction. Nevertheless, 
the ratings and changes were in the expected di 
rections, At the beginning, the two groups were 
very similar in the categories of minima! and poor 
The Highs decreased in these categories and in 
creased in adequate. The Lows tended to remain 
the same. 


Q. 8. “This patient’s motivation(s) for treatment 
was/ were 6s 


Differences in judgments of motivation at the 
two ratings were negligible for both groups. The 
crude category, need to change, had an overwhelm 
ingly larger number of frequencies th iny other 
category, and almost all these were drawy from the 
High group. There was a strong suggestion that 
the Low group was distinguished by a need to 
maintain the status quo and by a lack of motiva 
tion to continue in treatment. 


Q. 9. “In what specific ways does this patient ex 
hibit pathology ?’ 


The average number of items listed for the Highs 
was 3.4 at the beginning and 1.7 at the end; where 
as there were 2.7 for the Lows at both ratings. Hos 
tile interpersonal! relationships and conflict directed 
toward the self were most conspicuous among the 
Highs at the beginning, followed by depression and 
by somatic complaints. Distinctions among the cate- 
gories were much less pronounced at the postther 
apy ratings. There were no increases and the most 


conspicuous decreases were in hostility, depression, 


and sumatic complaint: 

For the Lows, the most conspicuous category at 
the beginning was conflict directed toward the self, 
the total here being slightly higher than for 


the Highs. This category continued high at the end 
and hostility was almost twice as high as at the 
beginning. The Lows had more mentions of psy 


chotic mechanisms at the beginning and about four 
times as many as the Highs at the posttherapy 
ratings 

Q. i0. “In what specific ways does this patient ex 


hibit conformity to society?” 

The mean number of items mentioned here for 
the High group was 2.1 at both ratings. For the 
Lows, the mean was 1.4 on the pretherapy rating 
and 1.2 on the posttherapy rating. None of the cate 
gories suggested any significant change. In the cate 
gories of marriage and employment, there were 
about twice as many frequencies for the Highs as 


for the Lows. 


Q. 11. “In what specific ways does this patie 
exhibit positive mental health?” 

The mean number of items mentioned for the 
Highs was 1.4 at the pretherapy rating and 2.3 at 
the posttherapy rating. Comparable figures for the 
Lows were .8 and .9. 

All items given here were classified into two 
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crude categories: those having to do with intra- 
psychic functioning and those having to do with 
personal relationships. The personal relationship 
items exceeded those for intrapsychic functioning 
for both groups at both ratings. The most conspicu- 
ous change was in personal relationships for the 
Highs, which had the highest frequency at the be- 
ginning and virtually doubled this at the postther- 
apy rating. 


Two diagnoses were available for each of 
these patients—the working diagnosis made at 
intake and that made especially for this study. 
The intake diagnosis may or may not have 
been made by the psychiatrist who became the 
therapist. Both diagnoses are summarized in 
Table 3 for comparision. 


Table 3 


Summary of Diagnoses 




















High Low Total 
In- Re- 

Diagnosis take search I R I R 
Anxiety reaction 10 4 8.06.7 
Schizophrenic reaction 5 5 9 8 14 13 
Hysteric reaction 5 3 os * S 
Epilepsy 2 2 00 232 
Obsessive- compulsive 

reaction 1 2 ts 2 3 
Other neurotic and 

character reactions 2 9 it Te 


Undetermined 0 0 : 4 1 4 





Data obtained from the case records have 
been summarized in the distinctly heterodox 
Table 4. On the whole, this table reveals dis- 
similarities in the two groups, but in regard 
to marital status, education, and percentage 
of disability they are substantially the same. 

The item of age presents one of the clear- 
est differences between the two groups, with 
the direction of the difference somewhat con- 
trary to expectations. Most successful patients 
were found to have a median age five years 
greater than that for the Low groups. In 1949, 
the usual Clinic patient was in his late twen- 
ties. This fact is, of course, a function of his- 
tory, since nearly all Clinic patients are vet- 
erans of World War II. From Table 3, we 
can see that our High group represented men 
who were among the first to be drafted, but 
who were already old enough to have had some 
experience as mature adults. The Low group, 











Table 4 
Summary of Personal and Social Data 
‘High Low 
1. Age in 1949 (median) 32 27 
2. Age at induction 
(median) 23 20 
3. Date of military 
induction (mode) 1941 1942 
4. Duration of service 
(median) 33.5 mos. 27.5 mos 
5. Number of individual 
interviews (median) 45 3 
6. Number of group 
sessions (median) 7* i* 
7. Duration of clinic 
contact (median) 20.5 mos. 2.5 mos. 
8. Per cent disability 
(intake) 30 30 
9. Occupational rating 
(median) Average Below Avg 
10. Unemployed (intake) 24% 56% 
11. Married (intake) 56% 48% 
12. Education (mode 12th grade 12th grade 
or more* or more* 


*Low Reliability. 








in contrast, contained a large proportion of 
men who were still in their teens and who 
had known, while still in school, that their im- 
mediate future was in military service. These 
conditions may well be related to the finding 
that there were more unemployed among the 
Low group and that their occupational rating 
was lower. 

It is interesting to speculate that the mere 
fact of added age and maturity was a deter- 
mining factor in the greater amenability to 
therapy of the High group. This can be tested 
in the future as all the men become older. 

The differences of number of individual in- 
terviews and duration of Clinic contact sug- 
gest that this is a powerful factor in determin- 
ing the therapist’s judgment of therapeutic 
success. Data presented for number of group 
sessions are quite unreliable, but do make clear 
the difference between the two groups. At 
this Clinic, patients rarely attend group ses- 
sions without concurrent individual therapv. 


Summary 


This study has been an attempt to explore 
empirically the concept of psychotherapeutic 
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success and to clarify its operational meaning 
within the context of a free outpatient clinic. 

The method used has been to obtain spon- 
taneous and questionnaire data from six psy- 
chiatrists within a single clinic concerning a 
group of patients judged by them to have been 
their most successful patients in therapy and 
a group judged least successful. These data 
were supplemented by information taken from 
the case records. 

Psychotherapy was defined at the beginning 
of the paper as a process of interpersonal re- 
lationship. From this definition, it follows 
that success is a function of both therapist and 
patient. Within the therapist, it takes the 
form of a judgment that “This patient has had 
a certain degree of success in therapy.” In order 
to formulate such a judgment, the therapist 
draws upon his total personality as expressed 
in explicit and implicit criteria which he has 
selected from among those offered him in his 
training and experience. Such criteria become 
most apparent when comparisons are made of 
extreme cases on the success continuum. 

To be sure, therapists’ criteria are typically 
expressed in terms of characteristics of the pa- 
tients. The question of the objective validity 
of their descriptions of the patients is open. 
Hence, we cannot say that patients with cer- 
tain characteristics will be successful. We 
must recognize the role of the therapist and 
limit ourselves to factors or characteristics 
which lead to judgments of success on the part 
of the therapists. 

Rather obviously, the present exploratory 
study can produce no conclusions. It does pro- 
vide us with fertile ground for the production 
of meaningful hypotheses which can be sub- 
jected to independent test. 


Hypotheses 


A. Patients are more likely to be regarded 


by their therapists as successful in therapy 


when they: 


1. Enter treatment with negative affect al 
ready mobilized. Negative affect broadly 
covers anxiety, fear, hostility, depression. 

2. Are able to verbalize their feelings and 


to use intellectual controls 


3. Are motivated by a need to change them 
selves. 
4. Utilize somatization along with other 


mechanisms. 


a 


Give up hostility depression, and somatic 


complaints. 
6. Conform to a moderate or high degree to 
social norms. 
7. Exibit positive mental health—self-devel- 
opment beyond the demands of social con 
formity 


8. Develop positive affective relationships 
Are in the fourth decade of life. 

10. Attend both group and individual therapy 
11. Continue treatment for a long period 
12. Are employed. 


13. Have an average or better occupationa! 


rating. 


B. When the above conditions are not met, 
the patients are more likely to be regarded as 
of low success. In addition, the patient with 
a demanding, dependent attitude is likely to 
be given alow rating by his therapist. Fur 
thermore, this patient becomes frustrated, de 
velops negative affect and discontinues treat- 
ment. 
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The Practice and Problems of Clinical Psychology 
in a State Psychiatric Hospital 


Jules D. Holzberg 


Connecticut State Hospital 


State mental hospitals are responsible for 
the largest number of psychiatric patients in 
this country. Statistics for a recent one-yea! 
period indicate that approximately seventy 
thousand beds were assigned for all types of 
patients in federal hospitals including those of 
the Veterans Administration [1]. Statistics 
for patients attending outpatient clinics are 
not quite so clear, but it is estimated that more 
than two hundred thousand patients were seen 
during this same period [10]. The number of 
patients in state hospitals, however, were well 
over a half million [1]. In spite of this, fewer 
clinical psychologists are employed in state 
mental institutions than in the other two types 
of agencies [11]. 

Among the factors contributing to this dis- 
parity are the widespread negative attitudes 
held by many concerning the professional 
standards of state mental hospitals. That these 
attitudes are not wholly unfounded has been 
made clearly evident [4]. While professional 
standards have been improved in recent years 
[2], they are still such as to deter many psy- 
chologists from taking appointments in state 
hospitals. 

Another factor making for the limited num- 
ber of psychologists in state mental institutions 
has been the defeatist attitude of many psy- 
chiatrists toward this problem. The Group 
for the Advancement of Psychiatry, manifestly 
dedicated to the ideals of progress in psychia- 
try, reflects this complacency in one of its re- 
ports: ‘... there is no possibility of any ex- 
tensive use of clinical psychologists in any func- 
tion of the state hospital for some years, re- 
gardless of policies, salary offered, or anv oth- 
er consideration” [11, p. 12]. 

Another factor has been the view of many 
psychologists [8] that the state hospital is a 
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ustodial rather than a treatment institution, 
with the consequent result that younger psy- 
chologists have tended to look upon mental 
hospital patients as hopeless and incurable. 
While the author accepts the fact that even the 
best state hospitals have not completely over- 
thrown the 


here 


t 
“custodial” concept, the emphasis 
on the failure to stress the positive role 
that psychology can play in contributing to a 
hange in the goals of the mental hospital. 


I S 


t, but by no means least, are the inade- 
quate salaries that are paid psychologists in 
many state institutions. The responsibility for 
this is shared equally by state personnel de- 
partments and state hospital administrators. 
State personnel departments have often paid 
little attention to the qualifications and train- 
ing required of psychologists in gauging salary 
levels, and many state hospital administrators 
have been content to accept their indifference 
without challenge. 

There is often a parallel between state hos- 
pital attitudes toward psychology and the con- 
ditions of care and treatment for patients in 
these institutions. Where there is active ther- 
apy, research activity, and a teaching program 
within the hospital, well-developed psycho- 
logical services will most likely also be found. 
If there is no interest in dynamic concepts of 
treatment, research, and training, one may ex- 
pect little interest in psychology. In many 
state institutions, psychologists are deliberate- 
ly excluded from employment. Where they 
are employed, frequently their services are but 
superficially accepted. The psychologist, in 
these instances, may find himself isolated 
from the rest of the professional staff and his 
effectiveness thereby seriously limited, but the 
hospital may paradoxically boast the facade 
of “a psychologist on the staff.” 
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It is the purpose of this paper to present the 
evolution and present status of a relatively new 
psychological program within a state mental 
hospital. It is hoped that this presentation may 
serve to stimulate discussion concerning the 
psychologist in the state mental hospital — his 
status, his functions, his contributions, and his 
problems. No single pattern of psychological 
organization will be suitable for all state hos- 
pitals [7] but the desirability of exchanging 
experiences in this important field of service 
and research seems highly desirable. 


The Hospital Setting 

The program to be described began in 1946 
when a psychologist was appointed to this hos- 
pital, the first such appointment since the 
founding of the institution 79 years before. 
Prior to the establishment of a psychology de- 
partment, psychometric services of a routine 
nature were performed by nonpsychologists. 

‘The Connecticut State Hospital is one of 
three state institutions for the mentally ill in 
the state of Connecticut and provides treat- 
ment and care for approximately three thou- 
sand patients. During the past year, there were 
pproximately 1000 admissions to this hospital, 
covering the gamut of psychopathological dis- 
The 
provide diagnostic and treatment facilities for 
mental d to 


orders. functions of the hospital are to 


natients ') 
pa 


t conduct training and 
research. 
[t was accepted from the beginning that the 


role of the psychology department in a mental 
hospital would necessarily have to parallel the 
functions of the mental hospital if it was to 
be integrated into the hospital organization. 
Since the functions of the hospital were diag- 
nosis, treatment, training, and research, the 
psychology department was oriented to con- 
tribute maximally to functions. The 
psychological program has grown out of arising 
needs, not by preconceived plan. The evolution 
of this program may be traced to those prob- 
lems which required the singular contribution 
of psychology, while providing at the same 
time opportunities for the individual psychol- 
ogists to find gratifications for their profes- 
sional needs. 

The department is presently organized with 
a director, four staff psychologists, five psycho- 
logical interns, two of whom receive stipends 


these 


Hos pital WY 


from the United States Public Health Service, 
and a secretary. The department 1s housed in 
a building consisting of and a 


ten ofhces 


library-conference 


room. lhe stati psy¢ hol 
ogists have similar functions but are igned 
to difterent parts ol the hospital. (ne taff 


psychologist is assigned to the male admission 
to the female 


A third psychologist work 


service and anothe 1dmission 


servi ce. in the con 


tinued treatment (chronic) service. A fourth 


psvi hologist is assigned to the outpatient clinics 


whic h serve the community surrounding the 


hospital. The interns are rotated through these 


services during the year. Emphasi placed 


upon fostering close relationships between the 


psychologists and the professional members of 


assigned 


the service to which they are 


All aspects of the staff psychologists’ re 
sponsibilities are under the Supervision of the 
director. It has been iccepted is part of the 


philosophy of this that staff psychol 


progran 


ogists, regardless of their level of attainment 


shall receive professional Sune 


I rvision hoth Tor 
training purposes as well as to maintain and 
sharpen the professional standards t in the 
department. This supervision } been 0 
vided through weekly individual conferences 
and a weekly group conference f ll the staff 
rye} } slovist 

The Psychological P OgT 
Diagnostic testing. All patients mitted 


during the previous 24-hour period are seen 


at a staff conference each morning. At this 
staff 
about the patient is presented and the patient 
The 
including the psychiatrists, 
psychologists, and social workers, discuss the 


conference, all available information 


is interviewed by the clinical directo 


professional staff 


probable diagnosis, the apparent dynamics of 
behavior, the studies to be performed, and the 
treatment to be instituted. I[t is. at these staff 
conferences that most referrals for psychologi 
cal testing services originate. [he patients re 
ferred for psychological study are assigned to 
the psychologists at the end of each week and 
it is the psychologist’s responsibility to plan 
his work week in accordance with his case 
load. 

After the completion of the psychological 
examination, a detailed psychological report is 


prepared [5], the original being placed in the 
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patient’s medical record and a copy being sub- 
mitted to the patient's psychiatrist so that it 
may serve as a basis for discussion between psy- 
chiatrist and psychologist. Approximately 
three weeks after admission, each patient is 
diagnosed and his progress evaluated at a diag- 
nostic staff conference. At this staff conference, 
the psychologist presents the results of his ex- 
amination to the professional staff. 

The emphasis in psychological testing is 
that the “. . . psychological test does not make 
a psychiatric diagnosis, but only contributes to 
it” [12, p. 3]. The term “diagnosis” as used 
by the psychologist here does not just connote 
nosology. Rather, it emphasizes an understand- 
ing of the patient’s illness through the psy- 
chological study of the processes underlying 
his behavior. 

Research. Research has been the second 
most important responsibility for the psychol- 
ogist from the point of view of time expendi- 
ture. In the evolution of the research program 
of the psychology department, research was 
initially handled as a responsibility of the psy- 
chologist which he would fulfill in addition to 
his clinical duties as he found time to do so. 
This is the usual manner in which research is 
organized in most clinical settings [8]. How- 
ever, it was soon found that organized and 
systematic research is very difficult, if not im- 
possible, on this basis. 

Since the hospital agreed that, “Any hos- 
pital or institution overlooks one of its im- 
portant assets, if it includes as a member of its 
staff a psychologist with research training and 
then, by requiring a full-time load of clinical 
service, excludes opportunity to utilize this 
training” [7, p. 643], it was necessary that 
research be organized in a different manner. 
Each psychologist was therefore given responsi- 
bility for clinical duties for two continuous 
weeks. On the third week, he was relieved of 
his clinical responsibilities (with the exception 
of therapy) and was expected to devote his 
energies to the research in which he was en- 
gaged. This plan in no way interfered with or 
reduced the total clinical responsibilities of the 
department. What it meant was that psychol- 
ogists would be called upon to carry greater 
clinical responsibilities on the weeks when they 
were assigned to clinical duties. However, the 
compensation for them was the realistic as- 
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surance that two weeks of clinical responsibili- 
ties would be followed by one week of research. 

Apart from making it possible for organized 
research to get its roots into the departmental 
structure, this plan also served to provide a 
varied program for the psychologist and thus 
tended to reduce the natural problems of bore- 
dom and routine that frequently set in with 
continuous clinical work. All the staff psy- 
chologists and interns carry research on this 
basis and it would seem at present that this 
has been the most successful way to manage 
the problem of organized and systematic re- 
search within a clinical setting. While it would 
be impossible to describe here all the research 
in which the department has been or is en- 
gaged, it can be stated that the nature of the 
research activity has varied with the interests 
of the individual psychologists and the research 
demands of the hospital. 

With the development of a better organized 
research program, the department has utilized 
its affiliations with local universities to seek 
consultation from individual members of their 
psychology departments on problems en- 
countered in research. This development has 
proven to be very fruitful in meeting research 
needs. 

A need in the area of research which is felt 
very seriously is to bring to the present pro- 
gram the laboratory techniques and procedures 
of experimental psychology. It is therefore 
planned that within the next year this hospital 
will begin the development of an experimental 
psychological laboratory staffed by an experi- 
mental psychologist. Since it is probable that 
a state mental hospital will never be able to 
fully support an all-inclusive research program 
involving all the necessary research workers, 
it is an additional goal to encourage research 
workers in the affiliated universities to partici- 
pate in hospital research programs. 

Therapy. Psychologists are encouraged to 
carry patients in individual and group psycho- 
therapy under psychiatric supervision. While 
the urgency of the psychotherapeutic needs of 
the patients was the original basis for psvcholo 
gists’ being encouraged to engage in therapy 
here, it is also recognized that psvchotherapy is 
a most important vehicle for deepening the psy- 
chologists’ understanding of psychopathology 
and psychodynamics. An individual psycholo- 
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gist may express interest in working thera- 
peutically with a patient, or the ward psychi- 
atrist may recommend a patient to the psychol- 
ogist. However, the final approval for assign- 
ment of a patient for psychotherapy rests with 
the clinical director. The psychologist meets 
regularly with the ward psychiatrist to discuss 
the progress of his patients. At periodic inter- 
vals, the psychologist prepares a progress note 
on each patient for inclusion in the medical 
record. Psychologists meet every week with the 
psychiatric consultant from Yale University 
School of Medicine for continuous case con- 
ferences on individual and group therapy. 
While the type of therapy practiced in the 
hospital is not of any one school, therapy is of 
an intensive sort with emphasis on reality 
problems. 

With regard to therapy, it is not intended 
that the psychologist shall relinquish his unique 
contributions in diagnostic testing and research 
by devoting more than a portion of his time 
to treatment [6]. 

Training. The psychology internship train- 
ing program will not be discussed in this paper 
since it will be the subject of a report in the 
near future. Psychology students from neigh- 
boring universities receive practicum training 
and conduct research at the hospital under the 
supervision of the psychology department. Stu- 
dent nurses, attending the School of Nursing 
at the hospital for a three-month course in psy- 
chiatry, receive formal lectures in elementary 
principles of behavior and clinical psychology. 
The University of Connecticut and the hos- 
pital provide one year of training in advanced 
psychiatric nursing. Students taking this course 
receive five months of training at the hospital 
and while there receive more advanced instruc- 
tion in psychological principles and techniques. 
A course in principles and techniques of clinical 
psychology is given annually to all new psychi- 
atric residents in training at the hospital. 

The hospital is a field training agency for 
the Yale University School of Medicine, and 
third-year medical students are sent there 
throughout the school year for five weeks of 
training in psychiatry. The psychology de- 
partment is responsible for their training in the 
purposes and procedures of clinical psychology 
in psychiatric practice. The students have 
weekly discussions with the director of the de- 


partment on psychological problems. ‘hey also 
observe the psychological examinations of pa- 
tients and discuss their observations with the 
stafi_ psychologists. 


have been established 
Psychiatry of the Yale 
University School of Medicine and the Depart 
ments of Psychology of Wesleyan University, 


Yale University, and the University of Con- 


Formal relationships 
with the Department of 


necticut, at each of which the director is a 
member of the faculty. To further strengthen 
these relationships, the Board of Trustees of 
the hospital has created an advisory committee 
to the psychology department consisting of a 
representative from each of the affiliated uni- 
versities.. This committee meets once or twice 
annually with the superintendent and the di- 
rector of the department to advise on problems 
of the psychology department. 

For two consecutive years, the psychology 
department has sponsored a series of lectures 
for psychiatrists on psychological principles of 


behavior. It is expected that such a lecture 
series will be a regular feature of the annual 
Postgraduate Seminar in Neurology, Psychia- 


try, and Related Fields of Medicine sponsored 
by the Joint Committee of State Mental Hos- 
pitals and the Yale University School of Medi 
cine. 

Community services. The hospital operates 
mental hygiene clinics whose permanent staff 
consists of two psychiatrists, a psychiatric social 
worker, a staff psychologist, and an intern psy 
chologist. The function of these clinics is to 
offer psychiatric services to individuals residing 
in the community. Here the psychologist’s re- 
sponsibilities are very similar to those of the 
psychologists in the hospital except that a 
greater proportion of his patients are children 
and adolescents. 

In addition to psychological testing, the 
clinic psychologist carries carefully selected pa- 
tients for individual and group therapy. This 
has permitted the psychological services in the 
clinics to move away from being chiefly a diag- 
nostic service to one that is offering therapeutic 
opportunities to greater numbers of individuals. 

The department has also carried responsi- 
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bility for the education of lay groups through 
courses in adult education programs, lectures 
to Parent-Teacher Associations and other com- 
munity groups, and radio talks. 

Personnel testing. Another responsibility 
that the psychology department carries is that 
of testing new employees of the hospital who 
work either on the wards or in the kitchens. 
In both of these places, large numbers of pa- 
tients come into contact with employees. This 
program grew out of the need to assist in the 
early detection of problem employees. The re- 
sults of the tests are used by the heads of the 
nursing and dietary departments to guide them 
in the placement of their employees. The value 
of these tests for predicting successful ward 
attendants has been studied [9]. 

Vocational rehabilitation. The hospital is at 
present the site of a unique experiment wherein 
a vocational counselor of the Connecticut State 
Rehabilitation Division has been assigned full 
time to the hospital to work with psychiatric 
patients in preparation for future employment. 
As a consequence of this program, the psy- 
chologists have been engaged in extensive vo- 
cational testing of psychiatric patients. The 
results of the tests and their interpretation are 
submitted in a report to the vocational coun- 
selor who uses them in his planning with the 
patient. A copy of the vocational study pre- 
pared by the psychologist is also filed in the 
medical record so that the psychiatrist may be 
aware of aspects of the patient that were not 
tapped in other psychological studies. 


Problems 


Until recently, one of the serious problems 
facing psychologists in Connecticut’s state men- 
tal hospitals has been the very low requirements 
that have been set for psychological positions 
and the accompanying inadequate salaries. 
This problem was, of course, not peculiar to 
mental hospitals in this state but constitutes a 
national problem of major importance. As of 
January 1, 1950, all psychological positions in 
state service have been reclassified with ac- 
companying salary increases through the efforts 
of the Connecticut State Psychological Society 
[3]. This step has gone a long way toward 
meeting the problem of shortage of staff. 

The present attitude of the psychological 
profession that the doctoral degree is the mini- 
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mal training requirement for a clinical psychol- 
ogist constitutes a problem for state hospitals. 
The present attitudes of psychologists toward 
state hospitals is such that even with improve- 
ment in salaries there is not likely to be any 
wholesale movement of Ph.D. psychologists in- 
to staff positions below the level of chief psy- 
chologist. While there may be some psycholo- 
gists who will insist that the needs of society 
not thwart psychology’s training goals, the psy- 
chological needs of the mentally ill in state hos- 
pitals cannot be ignored. During the period 
that more Ph.D. psychologists are being 
trained, it is the writer’s conviction that state 
hospitals must continue to recruit the best 
available people, and these are likely to be at 
a Master’s degree level. When such people 
are employed, it will be necessary to provide 
the supervision and training that will permit 
the staff to grow, and to encourage those of 
demonstrated competence to continue on 
toward their doctoral degrees. 

One problem that has been of concern to 
this hospital has been that of determining mini- 
mal standards of psychological service in a state 
mental hospital in terms of the minimal num- 
ber of staff psychologists to be employed. The 
Veterans Administration has set minimum 
standards of one psychologist to every fifty 
acute patients and one to every three hundred 
chronic patients [11]. The basis for these 
ratios is not indicated, and consequently one is 
not certain that these ratios should be set for 
state hospital patients. At any rate, it is felt 
that the problem of minimal standards is not 
unique for this hospital, but constitutes a prob- 
lem for all state hospitals. It would seem de- 
sirable that the American Psychological As- 
sociation, in collaboration with the American 
Psychiatric Association, assume responsibility 
for determining appropriate minimal standards 
for clinical psychology in state mental hospi- 
tals. Such standards would be of considerable 
importance as guides to planning. 


Conclusions 


This paper emphasizes that psychology can 
make significant contributions to state mental 
hospitals. Psychologists should be encouraged 
to work in these institutions as fruitful fields 
for service and research. A greater interest on 
the part of the profession in the problems of 
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psychology in state hospitals will yield a social 
return in the advancement of the treatment, 
research, and training goals of mental hospitals. 


Received May 29, 195]. 


References 


1. Arestad, F. H., Leveroos, E. H., Albus, W. R., 
& Corbett, W. W. Hospital service in the 
United States. J. Amer. med. Ass., 1948, 137 
1386. 

2. Blain, D. (Ed.) Better care in mental hospi 
tals. Washington: American Psychiatric Asso- 
ciation, 1949, 

3. Cotzin, M., & Holzberg, J. D. Reclassification 
of state-employed psychologists in Connecti 
cut. Amer. Psychologist, 1950, 5, 656-659. 

4. Deutsch, A. The shame of the states. New 
York: Harcourt, Brace, 1948. 


5. Holzberg, J. D. (Ed.) Case reports in clinical 


Psychiairic Hospital 103 


psychology. Brooklyn: Kings County Hospital, 
July 1951 
Krugman, M. The evolution of the clinical 


psychologist Imer. J. Orthopsychiat 1949, 
19, 29-31 
Landis, C., & Kinder, EF. Clinical psychology 


in the state hospital. Psychiat. Quart., 1948 
22, 641-645 

Wechsler, D. The psychologist in the psychi 
itric hospital. J. consult. Psychol., 1944, 8, 281 
28 

Yerbury, E. C., Holzberg, J. D., & Alessi, 5 
L. Psychological tests in the selection and 
placement of psychiatric aides. Amer. J. Psy 
chiat., 1951, 108, 91-97 

Directory of pyc hiatric clinics. New York: Na 
tional Committee for Mental Hygiene, 1948 
Public psychiatric hospitals. Topeka: Group 
for Advancement of Psychiatry, 1948 

The relation of clinical psychology to psychi 
airy. Topeka: Group for Advancement of Psy 
chiatry, 1949, 











The Behavioral Symptoms for Certain 
Organic Psychoses 


]. R. Wittenborn 
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‘Lhe present report describes an investigation 
which sought evidence for symptomatic simi- 
larity among mental hospital patients having 
a particular organic diagnosis. When this 
study was begun, it was supposed that certain 
organic diagnostic groups were symptomatical- 
ly homogeneous. If diagnostic groups were dis- 
tinctive in this respect, it should be possible to 
indicate the particular combination of symp- 
tom cluster scores which the clinician could in 
the examination of subsequent patients take as 
an indication for the organic diagnosis in ques- 
tion. If, however, there is considerable symp- 
tomatic diversity among patients who have a 
particular organic diagnosis, the nature and ex- 
tent of this diversity may be of interest because 
of the practical significance of the disease 
which the diagnosis implies. 

‘The sample comprised a group of 20 newly 
admitted patients at the Connecticut State 
Hospital. There were four alcoholic patients 
with Korsakoff’s syndrome, four alcoholic pa- 
tients with delirium tremens, four epileptic pa- 
tients, four paretic, three patients with Pick’s 
disease, and one with Alzheimer’s disease. It 
was possible to correlate each of these patients 
with every other one on the basis of the simi- 
larity or dissimilarity between them with re- 
spect to a standard set of 55 symptom rating 
scales [2]. The rating scales describe behavior 
symptoms only and are most suitable for de- 
scribing the currently discernible pathological 
behavior of patients suffering from functional 
disorders. Accordingly, the similarities or dis- 
similarities revealed in the present data should 
not be generalized to all kinds of evidence used 
in differential diagnosis but only to the cur- 
rently discernible behavioral pathology of men- 
tal hospital patients. 

Thurstone’s method of factor analysis was 
used to classify the patients on the basis of 


If the or 
ganic diagnoses are descriptively sufficient, 
factor analysis should reveal groupings of pa- 
tients which correspond respectively with the 
patient's 


their similarities and dissimilarities. 


diagnosis. If, however, there were 
more factors (groups of patients revealed by 
the factor analysis) than there are diagnostic 
categories employed in the present sample, or if 
the grouping of patients revealed by the factor 
analysis did not correspond with a grouping 
based on the diagnostic labels, one would be 
obliged to conclude that from the standpoint of 
svmpte matic behavior, at least, the organic di- 
Under 
these conditions, it would also be unprofitable 
to attempt to determine precisely the symptom 


pattern which characterizes each of these or 


agnoses are descriptively insufficient. 


Panic diagnoses. 


Tor the 


The intercorrelations 
based on symptom rating 
scales were factor analyzed by Thurstone’s 


method. 


20 patients 
a standard set of 


The resulting centroid factors were 
rotated orthogonally by the usual criteria for 
maximizing the number of zero loadings and 
minimizing the number of negative ones. The 
resulting rotated factor matrix is shown in 
Table 1. From a casual inspection of this table, 
it is apparent that more factors were required 
to account for the intercorrelations among the 
patients than there were diagnostic groups. It 
is also apparent upon further inspection that 
the rotated factors, although relatively distinct 
and clear-cut, do not correspond with any par- 
ticular diagnostic group, i.e., patients with the 
same diagnosis seem unlikely to have their 
highest loadings in the same factors. 

The symptoms which are descriptive of each 
of the patient groups revealed by the factor 
analysis are of some interest. The character- 
istic symptoms for each factor were determined 
by scoring each patient’s symptom rating 
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‘Table | 


Rotated Factor Loadings 


Patient 


No Age Sex Diagnosis 
\ +6 l Pick’s 
2 3 I Epilepsy 
| Epilepsy 
' } M Paretic 
44, | Alch. Kor 
( \ Mi Epilepsy 
} M Pareti 
x M kL pilepsy 
) 6 l Pick’'s 
1 +4, vi Alch. DT 
11 62 M Pick’s 
12 + M Alch. DT 
13 +8 M Paretic 
i4 +7 M Paretic 
15 60 M Alzheimer 
1¢ +0 M Alch, DT 
l $2 M Alch. DT 
i8 +7 M Alch.Kor 
19 19 M Alch.Kor 
20 f M Alch.Kor 


scales for the nine different symptom cluster 
scores which form the basis for the quantified 
[1]. These 


cluster scores have been described elsewhere 


multiple diagnostic procedure 


and are tentatively labeled on the basis of the 
diagnostic stereotype which they most resemble 


[2,3]. 


manic state, depressed state, conversion hys- 


They are specifically the anxiety state, 


teria, paranoid schizophrenia, paranoid condi 
tion, schizophrenic excitement, deteriorated 
hebephrenic schizophrenia, and phobic compul- 
sive scores. Each of the factors in Table 1 will 
be examined by referral both to the diagnosis 
of the patients who have high loadings with 
the respective factor and to the characteristic 
symptom cluster scores of the patients who have 
high loadings for the respective factor. Factor 
I comprises a patient with Pick’s disease, a 
paretic patient, one with epilepsy, one with 
Alzheimer’s disease, and a patient with Kor- 
sakoff’s syndrome. All the diagnostic groups 
except delirium tremens are represented in this 
factor. The most conspicuous cluster score for 
the patients who have high loadings on this 
factor is depressed state. The second most con- 
spicuous one is excitement. This suggests a 
group of patients characterized by both excite- 
ment and depression, or an agitated depression. 
These patients all had low scores for the par- 


Psychoses 5 
i il 1 vi 
OF0 162 07 j LE 
399 09 41Y 179 ) 
105 20% 150 1" 
068 056 16 2% / 4 
173 os if O91 / 
187 408 052 14 f oy 
555 Ol¢ Oy 14 
0 & 12 Lit } ; 
210 #46 008 15 /) 
VOU UUU QOU { 

) 146 / 

3¢ 72 462 ( j 
023 Ol¢ 40 4] 
199 39] 4 f 
613 O98 48] AT 
116 034 4 4 
121 042 ? 
168 153 " r 
157 752 ! y 
201 728 ! ; 
anoid, mani and phobi omp | te 


Three of the patients w 


ka tor I] have K orsakoft’ 


epileptic, one has Pick’s disease. Symptomat 


th high loading 
yndrome, two art 

ically, however, these patients are alike 

they have high 


teria cluster 


scores on the conversion 


only. 


patient who has 
with Pick’s disease, and an 


All three of 


these patients are characterized by 


Factor III is made up of a 
epilepsy, another 
other with Alzheimer’s disease. 
relatively 
high scores on four different symptom clusters 

excitement, depression, the deteriorated or 
hebephrenic type of schizophrenia, and conver- 
sion hysteria. These patients tend to have low 
scores on the paranoid and anxiety clusters. 

Factor IV is made up of two patients with 
‘T he 
principal symptom cluster score for these pa 
tients is paranoid schizophrenia. In addition 
they are characterized by relatively high scores 
on the schizophreni 


delirium tremens and one with paresis. 


excitement and on the 
conversion hysteria clusters. These three pa 
tients are also characterized by very low scores 
on the manic state cluster, depressed state clus 


ter, and anxiety state cluster. 
1A moderately high score on this cluster can re 


sult from physical symptoms without ar 
basis. 


hv sterica 
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Factor V is based primarily on two paretic 
patients. These patients have high scores on 
all of the symptom clusters except those which 
are primarily neurotic in nature, Le., anxiety 
state, conversion hysteria, and phobic compul 
sive. 

Facter V1 is of particular interest in the pre- 
sent study because the patients who comprise 
it vary greatly in their diagnosis but they are 
all women. Interestingly enough there seems 
to be no very conspicuous common symptom 
cluster score for these four female patients. To 
some degree all four patients are characterized 
by moderate scores on the paranoid schizo 
and moderate scores on the 


phrenic cluster 


schizophrenic excitement cluster but this is not 
conspicuous. A scrutiny of the cluster scores 
for this group reveals that all of the four pa 
tients have a number of important similarities 
with cluster scores but 
(This 
could be due to a pattern of symptom similar- 


respect to symptom 


the group is not uniformly similar. 
ity among these patients which does not cor- 
respond with any of the cluster scores, but it 
probably illustrates a limitation of measures of 
relationships well known to all who work 
with numerous variables simultaneously, i.e., 
a set of variables may be positively interrelat- 
ed because of overlapping elements and not 
necessarily because of a common element or 
elements. ) 

On the basis of the foregoing analysis it is 
apparent that many of the diagnostic groups 
comprise patients who are not characterized 
by any conspicuously consistent pattern of 
symptoms. Although three of the four patients 
with Korsakoff’s syndrome have most of their 
common factor variance in one factor, this 
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factor also includes three patients without 
this diagnosis. Similarly only two of the four 
paretic patients have high loadings on the same 
factor and only two of the four patients who 
have delirium tremens have high loadings on 
the same factor 


Summary and Conclusions 


Twenty patients drawn from several well- 
known organic diagnostic groups were classi 
fied on the basis of their symptom similarity. 
The method of classification involved a factor 
intercorrelations 


analysis of among the pa 


tients. ‘he correlations were based on a set 
of 55 symptom rating scales. Within the limi 
tations of the sample of patients and the set 


of rating scales the following indications re 
1 


1 


|. Patients suffering from the specified or 
gat ps} hoses are not $1 mptomati ally homo 
geneous 


] 


\ quantified multiple psychiatric diag- 
nosis based on symptom cluster scores is more 
descriptive of the symptoms of the patients 
than a diagnostic label. Apparently, etiologi 
‘al diagnoses have very little, if any, descrip 
tive value. 
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The Effect of Rater Differences on Symptom 
Rating; Scale Clusters 


]. R. Wittenborn,’ Marvin I. Herz, Kenneth H. Kurtz 
Wallace Mandell, and Sherman Tatz 


Yale Uniwersity 


In earlier studies intercorrelations among a 
set of 55 symptom rating scales have been sub 
jected to factor analyses. One of the analyses 
was based on symptom ratings made by psy 
chiatrists at the Northampton Veterans Hos- 
pital for patients who had been hospitalized for 
various lengths of time, who were heterogene 
ous with respect to age, and whose illness was 
considered in some cases to be due to function- 
al factors, in other cases due to organic factors 
[2]. ‘The symptom clusters found in this sam 
ple were, with one exception, the same as the 
symptom clusters revealed in a second sample 
of patients. “The second sample of patients was 
rated at the time of admission at the Connecti 
cut State Hospital at Middletown and com- 
prised patients considered to be suffering from 
functional disorders only and who were under 
60 years of age [3]. 

Because of the great similarity in the symp- 
tom clusters revealed in these two studies, it 
was tentatively concluded that in mental hos- 
pitals in the northeastern part of the United 
States, at least, a clustering phenomenon could 
be assumed to exist among the symptoms mani- 
fested by patients and that the general nature 
of this clustering tendency was likely to be 
uniform in many respects from hospital to hos- 
pital. This tentative conclusion seemed to be 
reasonable in view of the fact that the clus- 
tering difference between the two samples was 
considered to be related to a known difference 
in the sample. Specifically, in the Northamp- 
ton sample, which comprised patients who had 


1The support of Dr. Mark A. May, Director of 
the Institute of Human Relations, Yale University, 
is gratefully acknowledged. This report is based on 
a student project in the senior author’s class in 
statistics. 


been hospitalized for some period of time, there 


was a cluster of symptoms which seemed pos 
deteriorated 


sibly to indicate a hizophrenia 


of the hebephrenic type. This cluster was not 


found in the Middletown sample which com 


prised patients who were rated shortly after 
their arrival at the hospital 
At the time these original st lif t. « } 
ere submitted for iblication, it was con 
lered quite possible, despite the general con 
tency between two samples of patients from 
ditterent hospitals and with difterent rate: 
that some individual psychiatrists wo difter 
from each othe tt ent! } the ob eT 
t 0 pt ind the mann which 
the nterpreted the to make doubtful the 
eri ee ° edure | n 
yt rat r scale Ke th eason, it wa 
decided ft elect 20 t ting le 
which represented the ost important symp 
tom clusters found in earlier studies and to 


tor analyze intercorrelations for these symp 
scales for two different 


( ne 


one ps\ f } 


tom rating 


umples 


of patient ample of patients would be 


rated by iatrist and the other sample 
of patients would be rated by another psychia- 
trist. i 


In order to maximize any difference in 


mptom clustering due to differences in psy 
the psychiatrists who provided the 


ratings were 


chiatrists 
elected in a way calculated to 
maximize the differences in their scale ratings 
It was possible to find two psychiatrists who 
were rating similar patients but who differed 
in age, cultural background, theoretical bias. 
and training. 


The Analysis 


The 20 selected symptom rating scales were 
intercorrelated separately on the basis of the 
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ratings provided by each of the two psychia- 
trists. For the purposes of discussion, the psy- 
chiatrist whose background and training most 
resembled the American stereotype will be de- 
signated as Dr. A; the psychiatrist whose back- 
ground differed most from American stereo- 
type will be designated as Dr. B. Dr. A pro- 
vided 79 symptom rating scales, and Dr. B 
provided 119. The intercorrelations were fac- 
tor analyzed by ‘Thurstone’s centroid method 
and were orthogonally rotated by an employ- 
ment of the usual criteria of maximizing the 
number of zero loadings and minimizing the 
number of negative loadings. 


R. Wittenborn, M. 1. Herz, K. H. Kurtz, W 


Mandell, and §. Tatz 


scales are organized on the basis of the origi- 
nal symptom clusters to which they are related, 
and the factor loadings for the sample provided 
by Dr. A are presented with the factor load- 
ings for the sample provided by Dr. B. In eval- 
uating the similarity between the factor load- 
ings for the A sample and the factor loadings 
for the B sample, the reader should consider 
sources of discrepancy which would be indepen- 
dent of any systematic differences between the 
raters. Among such sources of discrepancy may 
be mentioned the limited reliability of both 
rating scales and of correlation coefficients and 
the variation among workers in order of ex- 


In ‘Table 1 the various symptom rating tracting factors and manner of selecting rota- 
‘Table | 
Re r Loadings for 20 Symptoms R | Separately by Dr. A and Dr. B 
I I] Ill I} \ Vi 
\ A B i A B A B A B 

2. Ideas change with 

spontaneous rapidity 39 221 222 -013 51 136 154 -014 602 739 039 067 
5. Delusional belief that 

he is evil -119 073 527 814 095 -—224 -097 043 353 044 -032 065 
6. Gives in easily to others 272 146 10 393 -~649 —412 137 022 287 -176 072 -—071 
8. Unaware of the feelings 

of others - ; 775 67 —086 —046 014 076 027 033 066 223 -111 045 
9. Use made of physical 

disease symptoms -170 -254 019 180 103 255 88 598 ++ 100 218 -—054 
12. ‘Temper tantrums 440 365 219 138 é 684 -061 152 54 038 190 196 
13. Avoids people 447 613 129 082 0 -—345 057 077 ~013 -—043 151 250 
14. Shouts, sings, and 

talks loudly 262 107 —063 230 580 592 174 012 413 398 013 084 
16. Incontinent because of 

own negligence... 695 72 —282 -—026 45 -070 i41 079 047 187 -174 -027 
18. Feelings of impending 

doom..... 132 109 606 689 209 -190 076 -—017 -106 023 386 357 
22. Cannot believe that he 

can be helped 266 149 652 724 -076 -083 095 026 012 -005 069 038 
24. Patient’s thinking 

clearly delusional 157 361 283 063 ~068 095 038 -159 146 627 642 
25. No organic pathology 

with emotional basis 057 —087 191 —141 —07 204 738 814 111 —041 103 033 
26. Feels systematically 

a 047 031 417 311 117 370 -123 144 -024 027 657 601 
27. Believes others 

influence him................... -034 -002 090 234 -208 093 008 —015 087 020 759 733 
29. Organic pathology 

with emotional basis_....... 176 006 138 -204 -075 090 518 675 -219 -086 026 299 
32. All overt activity is at 

I 251 499 022 287 -642 -507 -071 031 219 —084 133 -142 
36. Great variation occurs 

in rate of speech... 156 471 —~050 123 324 070 001 082 577 552 -—029 -—008 
37. Initiates physical assaults 375 335 038 278 531 675 -—041 -059 50 14 209 153 
55. Characteristically 

oppositional 612 392 -067 135 347 336 018 039 483 362 190 
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tions. Despite the possible detracting effect of 
these and conceivably other factors, the cluster 
patterns which exist among the symptom rat- 
ings for Dr. A and for Dr. B are remarkably 
similar and cannot be employed as a challenge 
to the hypothesis that symptom clusters among 
similar groups of mental hospital patients are 
relatively independent of the rater. This sug 
gests that a quantified multiple psychiatric di- 
agnostic procedure based on a scoring of a set 
of symptom rating scales may be reliably em- 
ployed by psychiatrists who differ from each 
other in background, training, and point of 
view.” In this respect, at least, the new des- 
criptive diagnostic procedure may be employed 
with greater confidence than the usual des- 
criptive labels of psychiatric diagnosis. 
Despite the various differences between the 
two psychiatrists who provided the data for the 
present analysis, they were similar in one im- 
portant respect ; they made the ratings willing 
ly and in good faith. Although the reader may 
feel reassured that rater differences such as may 
ordinarily be encountered among psychiatrists 
will not obviate the value of a rating scale 
procedure, the use of rating scales by raters 
who are profoundly doubtful of the appropri 
ateness of rating scales for evaluating any as- 
pect of patient’s behavior or who apply the rat 
duress should be tactfully 


ine scales under 


2Although it now seems unlikely that the quali 
tative features of a quantitative diagnostic profil 
are a result of the psychiatrist’s interpretation and 
it now seems likely that a cluster has the same kind 
of meaning from rater to rater, it is still quite pos 
sible that there are systematic differences between 
raters; i.e., some raters may in general rate “high” 
while others in general rate “low.” 


It is impossible to construct a rating 
scale in a that it: 
could be guaranteed. “Deadlocks” 
and “pitfalls” of interpretation 


avoided. 


manner such proper use 
of ambiguity 
an always be 


found by those who seek them. 


Summary and Conclusion 


The 


conducted for the purpose ol determining the 


present report describes an analysis 


degree to which the pattern among psychiatri 
symptoms is determined by known difterences 
in raters. Ratings were provided by two ps) 


chiatrists whose interests and background 


were known to be different. ‘The ratings were 
intercorrelated and analyzed separately. Clus 
ters revealed by the two analyses were mutu 
ally consistent and similar to the clusters which 
had been found originally in large samples of 
variety of 


patients rated by a psychiatrists. 


These data do not a claim that the 


qualitative behavioral significance of the clus 


hallenge 


ter scores used in the quant hed mult ple di 


agnosis relatively independent of ordinary 
differences between ps chiatrists 
Ry, Pi i; } 19 ] 
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The Effects of Emotional Adjustment and 
Intelligence Upon Bellevue Scatter’ 


Jack J. Monroe 
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Since the publication of the Wechsler-Bel 
levue Intellingence Scale over ten years ago, 
no single problem in the area of clinical re- 
search has been so widely studied as has that 
of psychometric scatter, and few have yielded 
more inconsistent results. The first evalua- 
tions of scatter on the Bellevue Scale have been 
attributed to Gilliland [4], who, using samples 
of psychotics and normals, concluded that in 
tertest variability was approximately 35 per 
cent greater in the psychotic group than could 
be predicted from Wechsler’s standardization 
data. Subsequently, however, the same au- 
thor, collaborating with Wellman and Gold 
man [5], in a similar study, found no statis- 
tically significant results. Rabin [7], has been 
credited with divising a “schizophrenic ratio,” 
which was a scatter index for which he claimed 
success in differentiating schizophrenics from 
normals and neurotics, and in a later study 
[8], from manic-depressives. Webb [15], how- 
ever, has demonstrated the questionable value 
of Rabin’s ratio by applying it both to schizo- 
phrenics and to normals taken from Rapaport’s 
experimental and control groups [9], where 
he found the ratio to be somewhat normally 
distributed with means which quite easily 
could have arisen from a homogeneous popu- 
lation. Wechsler [16], in the third edition of 
the Bellevue manual, proposed a scatter index 
for measuring mental deterioration. Schlosser 
and Kantor [10] have recently questioned its 
statistical significance in differentiating schizo- 
phrenics from psychoneurotics. No statisticall; 


1This study was conducted under the co-direction 
of Professors John M. Hadley and E. J. Asher, 
Purdue University, West Lafayette, Indiana, in 
partial fulfillment of the requirements for the de 
gree of Doctor of Philosophy. 


significant differences were found between the 
deterioration scores of psychoneurotics and 
those schizophrenics among whom deteriora- 
tion might be expected (simple, catatonic, and 
hebephrenic types) ; nor were there significant 
differences between the scores of paranoid 
schizophrenics, where deterioration might be 
least expected, and the residual schizophrenic 
groups. Rapaport and his associates [9], while 
at the Menninger Clinic, made a detailed 
statistical analysis of scatter patterns of selec- 
ted psychotics, neurotics, and normal control 
They demonstrated statistically re- 
liable differences between these groups with 
Wittenborn 
[17] has quite recently subjected some of the 
implications of Rapaport’s conclusions to rath- 


groups. 


respect to intertest variability. 


er rigid statistical test, by stating certain hypo- 
theses relevant to variability patterns and test- 
ing them under conditions favorable to Rapa- 
port’s claim. No strong evidence was found 
for any of the hypotheses. 

There are clearly conflictive findings from 
investigations conducted in this problem area 
of psychometric scatter. Some evidence has re- 
cently appeared in the literature [1,2,3,6] to 
support the hypothesis that the intellectual lev- 
el of the subject may be related to his vari- 
ability on the Wechsler-Bellevue Scale. If 
this hypothesis were found to be true, it might 
explain some of the inconsistencies between re- 
sults of investigations in this area, since level 
of intelligence of subjects has seldom been con- 
trolled in these studies. 

The present study is an exploratory analysis 
of variance in which the main effects of the 
three independent variables of adjustment, in- 
telligence, and geographical location upon in- 
tra-individual scatter are studied. First-order 
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interactions between pairs of variables are also 
estimated. 


Procedure 


Individual standard deviations on the Belle 
vue Scale were computed for 352 subjects (144 
schizophrenics, 136 psychoneurotics, and 72 
well-adjusted normals). These variability 
scores were ordered to a 3 X 3 2 factorial 
design, classification being made with respect to 
three independent variables of adjustment, in 
telligence, and locality. ‘hese variables were 
isolated and broken down in the following 
manner : 

(a) The locality variable arose by virtue 
of the fact that the total sample consisted of 
approximately equal subsamples taken from 
two different geographical areas. Hereafter, 
these subsamples will be referred to as the Kan 
sas and Indiana samples. The Kansas sample 
in this study comprises a portion of Rapaport’s 
[9] experimental and control groups. The In- 
diana sample was drawn from several Veterans 
Administration installations in the state of In- 
diana and has been presented by Spaner [14]. 

(6) The adjustment variable was broken 
into three factors, classification being deter- 
mined on the basis of diagnosis of schizophren- 
ia, psychoneurosis, and “well-adjusted” normal. 
In the Indiana sample the diagnosis of “well 
adjusted” was determined by the investigator 


on the basis of scores on the Minnesota Mul 
tiphasic Personality Inventory 

(c) The intelligence variable was also brok 
en into three factors, accomplished by ordering 
(total Bellevue 


scores) to low, medium, and high intellectual 


1Q equivalents weighted 
levels, Limiting scores for the middle range 
were set at plus and minus one probable er 
‘The 
weighted score for the sample was approxim 
ately 


ror from the total sample mean. mean 
105 with a probable error of 10 

‘Thus the data were classified with respect to 
three variables in a 3 * 3 ’ jactorial design 
and the 18 subclass means were tested for het 
erogencity, after subclass 


Variances were found 


to be homogeneous. Modifications in ordinary 
analysis of variance procedure were introduced, 
because certain irregularities in the data, par 
ticularly the presence of unequal and dispro 
portionate subclass frequencies, rendered ord 
inary methods inadequate. Employing statis 
tical methodology described by 
others [11, 12, 13, 18 


Variance was performed 


Snedecor and 
a thorough analysis of 
and all logical com 
binations of subclass means were studied for 
heterogeneity 


Results 


A total of 22 factorial designs were written 
and analysed in the execution of this study. 


pulled a 


These designs systematically 


‘Table | 


Preliminary 


Description of Design Method 
Design 1-A: Preliminary Analysis 
of Variance using Original Data Ordinar 


with Unequal and Dispropor 
tionate Class Numbers in the 


Potal Sample 





Design 1-B: Preliminary Analysis 
of Variance in the Kansas Sam Ordinar: 

ple using Original Data with 

Unequal and Dispropor- 

tionate Class Numbers 








Design 1-C: Preliminary Analysis 
of Variance in the Indiana 
Sample using Original Data 
with Unequal and Dispro- 
portionate Class Numbers 


*Significant at .05 level. 
**Significant at .01 level. 





Analyses of 


Method 


Method 


Ordinary Method 


maxi 
Variance 
lest of f 

Signin ance Rati i 
Subclasses/ krror 2.92* 1 ¢ 
Intelligence/ Lrror 11.03** 4.02 
Adjustment/ Err y . 
Locality/ Error f 26 
Subclasses/ Lrro +51 2.00 
Intelligence/ Error 11.92 4.06 
Adjustment/ Error 6.92* 4.0% 
Subclasses/ Error 1.7% 3.06 
Intelligence/ Error 1.70" 4.04 
Adjustment/ Error 4.16" 

2.0 
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mum of information from the data and en- 
abled the investigator to isolate successfully 
the source of heterogeneity within the over- 
all design of 18 subclasses. It was convenient 
to group the results of these designs into five 
summary tables, since something slightly dif- 
ferent was accomplished with each group. 
Space does not permit the tabular presenta- 
tion of all designs; nor indeed is it necessary 
to do so, since many of them describe prepar- 
atory analyses leading to certain crucial and 
decisive tests of difference which reveal the ul- 
timate source of hetevogeneity in the over-all 
design. 

A preliminary analysis of variance described 
in Table 1 gives an unbiased test for the homo- 
geneity of the 18 subclass means of the total 
sample, as well as reveals essential differences 
between the Kansas and Indiana subsamples. 
An F ratio of 2.92, since it is significant be- 
yond the 1 per cent level of confidence, dem- 
onstrates that even though their variances are 
homogeneous, the means of the 18 subclasses 
are different. This finding furnished statistical 
justification for continuing the analysis. It 
should be stressed that the preliminary anal- 
ysis described above, while it gives an unbiased 
estimate of heterogeneity in the over-all de- 
sign, gives only biased estimates of the effects 
of variables because of the influence of unequal 
and disproportionate frequencies in the var- 
ious subclasses. 

A second group of designs, as well as all 
subsequent designs, were written with the pur- 
pose of circumventing the influence of the dis 
proportionality of the data. Using special 
methods described by Snedecor[11], the ef- 
fects of three independent variables were a- 
gain studied, but this time in a two-way classi- 
fication, thus leaving one of the variables un- 
controlled. These tests reveal highly signifi- 
cant differences between means, attributable 
to the effects of intelligence level and adjust- 
ment type upon scatter. There is also a strong 
indication of a significant interaction between 
these two variables. These differences may 
be spuriously great because of the uncontrolled 
third variable, but this offers no serious dif- 
ficulty in psychological interpretation. High- 
ly significant effects are doubtlessly recognized 
by the experienced clinician who appraises the 
Bellevue scattergram as an aid in diagnosis. 
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What these analyses indicate, however, is that 
the effect of intelligence level and adjustment 
type may not be mutually independent. Sta- 
tistically, this phenomenon is demonstrated by 
the presence of interaction between the adjust- 
ment and intelligence variables. Clinically, it 
may mean that the psychodiagnostician who 
appraises a Bellevue scatter pattern is impressed 
not only by the effect of adjustment, but 
by the effect of a given intelligence level as 
well 

A third group of designs are statistically 
sound, since the Kansas and Indiana subsam- 
ples are analysed separately, thus providing a 
mutual control of the adjustment and intelli- 
gence variables in a two-way classification. It 
is obvious that in these designs, precise in- 
formation concerning the effect of geograph- 
ical locality is lost; but by making a subjec- 
tive comparison of the subsample results, it 
seemed clear that the significant effects of the 
adjustment and intelligence variables were at- 
tributable to the Kansas sample, since neither 
of these variables produced significant effects 
in the Indiana sample. It was mentioned that 
this finding was suggested by the preliminary 
analysis of variance, described in Table 1. No- 
tice the curious discrepancy in Design 1-C 
where both the adjustment and intelligence ef- 
fects appear significant at the 5 per cent 
level of confidence while the effect of total 
subclasses fails to reach significance. The ap- 
parent significant effects revealed here are 
thought to be spurious, since in subsequent de- 
signs, when the influence of disproportional- 
ity of class frequencies was circumvented, the 
Indiana sample revealed no significant effects. 

A further modification of design introduced 
a mechanical control of variables which is 
methodologically equivalent to traditional ex- 
perimental technique. It was possible, for ex- 
ample, to study the effects of adjustment and 
geographical locality using only subjects in the 
low intellectual group, and in subsequent de- 
signs to repeat this procedure with medium 
and high intellectual groups. In this way, the 
intelligence variable was controlled by using 
subjects “matched” with respect to intelligence 
in that they fell within a restricted intellectual 
range. In a similar way, the effects of intelli- 
gence and locality were studied using only sub- 
jects of a common adjustment type. Results 
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of these experiments reveal that the hetero- 
gencity of means within the adjustment vari- 
able occurs only in the low intellectual group 
of subjects. The effect of adjustment is es- 
sentially negligible at the medium and high 
intellectual levels. Similarly, the heterogeneity 
of means within the intelligence variable oc- 
curs only among schizophrenics. 


‘Thus, by systematic analysis of variance of 
logical combinations of subclasses, the observed 
heterogeneity within the over-all design was 
traced to two sources: (a) Schizophrenics of 
different intelligence levels show differences in 
scatter, and (4) subjects with low intelligence 
of different adjustment types show scatter dif- 
ferences. A final group of designs, presented in 
Table 2, carried the analysis to its logical com- 
pletion and demonstrated that excessive vari- 
ability on the Bellevue Scale arises only from 
those schizophrenics with low intelligence. 
There are no statistically significant differences 
between schizophrenics of medium and high in 
telligence 1 


nor do schizophrenics at those intel 
le 


ctual levels differ significantly from other 


ment types. 


Conclusion And Discussion 


Intra-individual variability scores on the 


Wechsler-Bellevue Intelligence Scale for 352 


) 


subjects were ordered to a 3%3%2 factorial 
design, where classification was made with re 
spect to the three independent variables of in 
telligence level, adjustment type, and geograph- 
ical locality. Analysis of variance results wat 
rant the following conclusions: 

1. The 18 subclass means resulting from 
the 3X32 classification were hetcrogencous, 
while the variances of the subclasses were 
homogeneous. 

2. A Kansas sample of subjects seemed to 
be more variable on the Bellevue Scale than 
did a similar sample of Indiana subjects. 

3. There was a strong indication of the 
presence of interaction between the adjustment 
and intelligence iriables is influences of 
Bellevue scatter 
Bellevue Sx ale 


those s hizophreni s 


4. Extreme scatter on the 
is characteristic of only 
with low intelligence. There was no strong 
indication that differences in scatter exist be 


tween neurotics and well-adjusted normals: 


‘Table 2 


Final Analyses of Variance 


Description of Design Method 


Design 19: Analysis of Variance of 

Main Effects of Locality and 
Adjustment (Excluding Psy- 
chotics) in the Low Intellectua! 
Range, Interaction Negligible 





Design 20: Analysis of Variance of 
Main Effects of Locality and 
Adjustment (Excluding Nor- 
mals) in the Low Intellectual! 
Range, Interaction Negligible 











Design 21: Analysis of Variance of 
Main Effects of Locality and In- 
telligence (Exciuding Low 
Group) among Schizophrenics, 
Interaction present 


of Means 





Design 22: Analysis of Variance of 
Main Effects of Locality and In- 
telligence (Excluding High 
Group) among Schizophrenics, 
Interaction present 


of Means 





*Significant at .06 level. 
**Significant at .01 level. 


Weighted Diff. 


Between Means 


Weighted Diff. 
Between Means 


Weighted Squares 


Weighted Squares 


Test of ij 
Significance K fh 
Ad ‘ i ; 4 H4 
Localit ert l +04 
Adjustment 6.59" :.92 
Locality/ Error 602" 409 
Intelligence/ Erros 78 4.02 
Locality/ Error 9 4.02 
Intelligence, l-rror 67* 92 


Localitv/ Err 9.06** 3.92 
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and those schizophrenics with medium or high 
intelligence are no more variable on the Belle- 
vue Scale than any other adjustment types stud- 
ied in this project. 

The finding that scatter differences which 
seem to characterize Rapaport’s experimental 
groups [9] does not hold for a similar group 
of Indiana subjects is rather baffling at first 
glance, primarily because the “locality vari- 
able” is poorly defined. Obviously, the fact 
that one sample was drawn from Kansas and 
the other from Indiana offers no psycholog- 
ical interpretation of the differences that were 
found. These differences were, nevertheless, 
found ; it is an empirical fact, and the absence 
of an adequate psychological label, while some 
what disturbing, does not alter the facts. 
Whatever the psychological variables are which 
account for these differences between the Kan- 
sas and Indiana subsamples, the variable as 
defined in the present set of experimental de- 
signs is a systematic one and cannot rightly 
be attributed to error. By controlling the vari- 
able in this study the investigator was pro 
vided a more accurate estimate of experimen- 
tal error by which to test the main effects of 
the adjustment and intelligence variables. The 
psychological “meaning” of this difference 
must await future research, but in the mean- 
time caution seems indicated in accepting at 
face value scatter indices for the Bellevue test 
which have been arrived at under conditions 
different from those which govern the group 
upon which those indices are to be applied. 

Statistically, the interaction between the 
variables of adjustment and intelligence indi- 
cates that they are not independent as influen- 
cers of scatter. Clinically, this will mean that 
the impact of both variables is felt by the clini- 
cian who uses the Bellevue scattergram as an 
aid in diagnosis. The fact that only certain 
schizophrenics show excessive scatter will ob- 
viously limit the use of the scattergram in de- 
tecting psychosis. The rather widespread 
clinical assumption that “excessive variability 
[on the Bellevue Scale] is the most ominous 
sign of maladjustment” [9] should doubtless 
be reinterpreted in light of new knowledge. 


Received May 7, 1951. 
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Digit Span as an Anxiety Indicator 


Stanley Moldawsky and Patricia Corcoran Moldawsky 
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The Digit Span subtest of the Wechsler- 
Bellevue Scale has been considered diagnos- 
tically helpful by clinicians as an indicator of 
the presence of anxiety [4, 7, 8]. A usual 
procedure is to note discrepancies between the 
Digit Span score and that of the total Verbal 
Scale. Rapaport uses Vocabulary level as a 
comparison measure; “A Digit Span score 
much below the vocabulary level . . . is main- 
ly indicative of the presence of anxiety” [7, 
p. 193]. 

At the same time, however, the digit score 
is not considered adequate in itself to identify 
the clinical group, anxiety-neurotic, for two 
reasons: (a) other clinical groups do poorly 
on this subtest [1; 11, p. 84] and (4) some 
anxiety neurotics do not show this differential 
decrement [2, 9]. One explanation of dis- 
crepant findings may lie in the unreliability of 
diagnostic judgments of manifest anxiety, par- 
ticularly since manifestly anxious individuals 
may show other kinds of symptoms also. 


In order to investigate the hypothesis that 
anxiety results in a greater decrement in 
Digit Span scores than in Vocabulary scores 
it would appear reasonable to attempt to ma- 
nipulate anxiety in normal subjects in a con- 
trolled situation and thus determine its effects. 


Procedure 


Introductory psychology students served as 
subjects. All had been previously tested on 
the Full Scale Wechsler-Bellevue. They were 
selected from a group of eighty-four on the 
basis of their Digit Span scores not deviating 
more than two points from their verbal mean. 

Four groups of eight Ss each were matched 
according to Verbal IQ. The four groups were 


1This paper was presented at the Midwestern 
Psychological Association, Chicago, April 27, 1951. 
The authors are indebted to Dr. I. E. Farber un- 
der whose direction this experiment was conducted. 


assigned at random to two control and two 
experimental conditions. Control Group | and 
i xperimental Group 1 received Digit Span 
first and Vocabulary second; Control Group 2 
and Experimental Group 2 received the oppo 
site order of presentation. 

The procedure for the control groups was 
like that of an ordinary clinical testing situa 
tion. Every effort was made to keep the sub 
ject confident in his performance and moti 
vated to do his best. The explanation was 
given that the examiner was interested in the 
subtests and so wanted to give them to a num 
ber of people who had had them before. ‘The 
regular Wechsler-Bellevue instructions were 
read, 

For the experimental subjects the procedure 
was designed to be anxiety-prod icing. The 
techniques used were aimed at arousing feel 
ings of inadequacy which would contribute spe 
cifically to fear of failure in the testing situa 
tion, on the assumption that these reasons are 
the most typical during a clinical examination. 

The subjects were met outside the testing 
room and questioned by an “apprehensive” ex 
aminer as to whether they had reported to the 
correct experimental room. Since class credit 
was given for serving as subjects the imminent 
rejection by the examiner was assumed to be- 
gin to raise their anxiety level. After a search 
through the list of names, the S’s name was 
found and he was brought into the testing 
room and seated. The examiner then hooked 
up a recording apparatus and turned on micro- 
phone switches. This was actually a “dummy” 
apparatus, yet red lights indicating recording 
was taking place and the experimenter’s atti- 
tude suggested actual recording. Interrogation 
afterwards indicated that all the Ss believed 
they were being recorded. Two hand-micro- 
phones were present on the table. The experi- 
menter then sat down opposite the subject and 


115 








116 §. Moldawsky and P. C. Moldawsky 


gave the following instructions in a sympa- 
thetic fashion so as not to arouse hostility. 
The examiner had memorized the instructions 
and they were presented conversationally. 


You took a psychological test earlier this semes- 
ter. Do you remember? [If there were any ques- 
tions, E mentions blocks, pictures, words, etc.] Well, 
you have been selected from the group that took 
that test because there was something very odd 
about your test behavior. You have been called 
back to see if maybe you couldn't improve it, that 
is, make it more like the rest of the group. 

Like the one you had before, this is a test of 
your intelligence. Let's see 

’s class? Hmm. [E looks in psychology con- 
ference section grade book] Section —? [E puts 


you're in Mr 


grade book away] Would you mind saying vour 
name loudly into the microphone? Spell it out 
Throughout the test the graduate clinical assistan 
will be present to observe you. Please speak clear! 
into the microphone as we are recording vour r¢ 
sponses as an additional check. Try to do as well as 
you can as your record was really very poor last 
time. The graduate clinical assistant will discuss it 
with you and also answer any of your questions 
when I have finished 


The EF left the room and brought in the 
“graduate clinical assistant.” Throughout the 
test the “‘assistant’’ sat behind the subject, out 
of view. The same experimenter tested all 32 
Ss. Often some of the Ss injected questions 
which were parried and referred to the “‘assist- 
ant.” The “assistant” stated that he had done 
well on the retest. Ss were requested not to 
tell others what had occurred because more Ss 
were to be tested. 


Table 1 


Test-Rest Differences of the Four Sub-Groups 











Retest ¢@ Diff. 


Test 1 o 
Control Group 1 
Vocabulary 12.62 1.17 13.00 1.00 + .38 


Digit Span 11.75 16 12.50 2.64 L 75 











N 


Control Group 2 
i 


Vocabulary 12.50 50 13.25 83 75 
Digit Span maAi3 2.37 3 2.28 +41.00 
Experimental 
Group 1 
Vocabulary 11.63 .85 13.38 66 75 
Digit Span 12.13 1.96 12.00 2.96 13 


Experimental 

Group 2 

Vocabulary 12.13 78 12.63 68 + .50 
Digit Span 12.88 2.14 10.75 2.77 —2.13 





Results 


The means of the weighted scores for both 
subtests for each group on the initial test and 
the retest are presented in Table 1. The Digit 
Span retest scores for both control groups were 
higher than the original test. However, this 
rise was not significant for either group; ¢’s 
equaled .74 and .89 for the two groups. Both 
experimental groups showed an absolute drop 
on retest of Digit Span; however, only the 
second group which received Vocabulary first 
and Digits second was significantly different. 
Its ¢ equaled 2.36 (p > .05). 


Table 2 


Mean Test-Retest Differences* 


Controls Experimentals 





lest Mean o Mean 0 
Vocabulary: +562 .622 437 .747 
Digit Span +875 8.234 1.125 8.859 


*These mean differences were calculated by subtract- 
ing the initial test score from the retest for each indi 


vidual and then finding the mean of these differences 


] 


for all individuals in each group 


The control groups were not significantly 
different from each other, nor were the ex- 
perimental groups significantly different from 
each other. Therefore, the control groups were 
combined and compared with the combined 
experimental groups. These data are presented 
in Table 2 as mean differences between test 
and retest. 


The mean change in Vocabulary for the 
controls was + .562 and the change for the 
experimentals was + .437. Both groups then 
showed a rise in vocabulary but these differ- 
ences were not significant. The ¢ equaled .414 
(p > .35); degrees of freedom equaled 30. 

The control groups showed a rise in Digit 
Span scores on retest whereas the experimental 
groups showed a decrement. This difference 
was significant in the expected direction. The 
t equaled 1.870 (p > .05); degrees of free- 
dom equaled 30 (single-tailed hypothesis). 

The mean rise in Vocabulary scores was not 
significantly different from the mean rise in 
Digit Span scores for the Control groups. The 
mean rise in Vocabulary scores was signif- 
icantly different from the mean drop in Digit 
Span scores for the Experimental groups. The 








hy 
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t equaled 1.924 (p > .05); degrees of free- 
dom equaled 15. 


Finally, the differences between the mean 
rise in Vocabulary scores (+ .562) and the 
mean rise in Digit Span scores (+ .875) for 
the control groups was compared with the dif- 
ferences between the mean rise in Vocabulary 
scores (+ .437) and the mean drop in Digit 
Span scores (— 1.125) for the experimental 
groups. This difference between differences 
was significant at the 5 per cent level. The 
t equaled 1.726; degrees of freedom equaled 
30. Schematically, this was calculated as fol- 
lows: 


Controls (Vocab Vocab.,) minus (Digit, — 
Digit,) minus Experimentals (Vocab., Vocab., ) 
minus (Digit, Digit, 

Discussion 


The experimental results support the hy- 
pothesis that anxious individuals do relatively 
poorly on Digit Span tests. As noted in the 
introduction, the authors mean by anxious in- 
dividuals those who are anxious in the situa- 
tion rather than those who are clinically diag- 
nosed anxiety-neurotic. Although research us- 
ing clinical groups is at variance on this issue 
[2, 7] this experiment tends to support Rap- 
aport’s conclusions [7, p. 84]. It is interesting 
that Gilhooley disagrees with the hypothesis 
on the basis of comparing an anxiety-neurotic 
group with a total neurotic group minus the 
anxiety group. Since anxiety is a common 
symptom among all the neuroses, it is not sur- 
prising that the anxiety-neurotics do not be- 
have differently from neurotics in general on 
this subtest. 


An obvious question raised by the results is 
why the experimental group which received 
digits first did not drop significantly in Digit 
Span scores as did the experimental group 
which received Digits after the Vocabulary 
test. The answer might lie in the order of 
presentation, as this was the only methodolog- 
ical difference between the groups. It seems 
likely that the failures experienced at the more 
difficult end of the Vocabulary list combined 
with the examiner’s lack of encouragement 
tended to enhance the effects of the instruc- 
tions. On this basis then, the experimental 


group receiving Vocabulary first should have 
the highest general anxiety level by virtue of 
the interacting effects of the procedure, atti- 
tude of the examiner, and the possibility of 
experiencing failure; the experimental group 
receiving Digits first should be second because 
of the procedure, and (to a lesser extent) the 
attitude of the examiner; and finally, the con- 
trol groups the least, if any. The prediction 
from the hypothesis is that Digit Span drops 
on retest would parallel this “breakdown’ by 
anxiety level, and this is what actually o 

curred. This consistency should be interpreted 
with caution, of course, since statistically, the 
order of presentation was not significant. Pos 
sibly the latency in the building up of the 
anxiety reaction to the level at which it inter 

feres with performance on Digits is longer 
than the time between the giving of the in 
structions and the presentation of the Digits 
as the first test. 

An incidental result of the experiment fog 
which some explanation might be sought is 
the consistent increases in Vocabulary under 
both control and experimental conditions. Two 
subjects mentioned spontaneously in the dis 
cussion with the “clinical assistant” that they 
had looked up some of the words since the 
last test. This suggested that perhaps a suf- 
ficient number of the Ss were motivated either 
to look up the words, ask for definitions, or 
at least attend to them if they happened upon 
them, to account for larger scores on retest. 
It is assumed, furthermore, that this is more 
likely to have occurred with this group since 
the daily environment (college) is highly 
verbal. 

The comparatively greater sensitivity of the 
Digit Span subtest to anxiety can be under- 
stood in terms of learning. Several investiga- 
tions [4, 6, 10, 12] indicate that anxiety is 
a potent factor in learning. When anxiety 
level is high, learning of relatively complex 
material is less efficient. A Digit Span series 
can be defined as a learning task with one trial. 
and a high anxiety level should then have an 
interfering effect. The Vocabulary subtest on 
the other hand taps responses that are more 
or less already learned. Montague and 
Schneider [6] found that once their anxiety 
Ss had reached a certain criterion of learning, 
their retention was as good as the nonanxious 
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subjects. Therefore, the already-learned Vo- 
cabulary should not be expected to differenti- 
ate the groups. 

The obvious implication of the experiment 
that a lowered Digit Span score, as compared 
with other tests, is diagnostic per se of anxi- 
ety is not justified. While anxious people are 
expected to do poorly on Digits, other factors 
may adversely affect performance also, e.g., or- 
ganic conditions, excessive fatigue, or lack of 
motivation [11, pp. 84-85]. 

An implication of the results for clinical 
practice is a cautious interpretation of 1Q’s 
which are partially dependent on Digit Span 
subtests. Clinicians commonly use other data 
in evaluating intellectual level besides the ab- 
solute 1O “number,” and the results of this 
experiment support this practice. 

The Williams study [12] similarly demon 
strates the sensitivity of the Digit Symbol sub- 
test to stress conditions which adds further 
justification for a more flexible approach. An 
additional implication is that, if possible, the 
subtests should be manipulated so that the 
Digit Span does not follow a failure experi- 
ence, 

This study is consistent with the general 
finding that the variability of Digit Span 
scores is relatively large and the variability of 
Vocabulary scores is relatively small [3]. It 
can be seen from Table 2 that the variability 
of Digit Span was large but similiar for both 
the controls and the experimentals. Therefore, 
despite this amount of variability, the drop in 
Digit Span for the experimental group was 
significant. 


Summary 


1. ‘The experiment was designed to test the 
hvpothesis that anxiety will function so as to 
cause a significantly greater decrement in 
Digit Span scores than in Vocabulary scores 

2. Thirty-two college subjects previously 
tested on the Full Scale Wechsler-Bellevue 
were retested with the Vocabulary and Digit 
Span subtests. Half of the Ss received the 
usual clinical rapport-establishing procedure 
and half the anxiety inducing procedure. One 
of each of these groups received Digit Span 
first and Vocabulary second, and the other 
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two, the opposite order of presentation. Groups 
were matched according to Verbal 1Q. 


> 


3. The results supported the hypothesis. 
They would tend to reinforce the clinician’s 
confidence in the Digit Span subtest as being 
sensitive to situational anxiety and in the Vo- 
cabulary subtest as being relatively impervi- 
ous to it. 


4. It was suggested that the Digit Span is 
more sensitive than Vocabulary to anxiety be- 
cause it is essentially a learning task, while 
Vocabulary taps already-learned responses. 


Received May 24, 1951. 
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An Evaluation of Published Short Forms 
of the Wechsler-Bellevue Scale ' 


Fred H. Herring 


VA Mental Hygiene Clinic, Denver, Colo 


The history of short forms of the Wechsler- 
Bellevue Intelligence Scale is replete with pub- 
lished research and private opinion in clinical 
practice. But, except for McNemar’s [9] 
study, no intensive attempt has been made to 
evaluate all the short forms proposed in the 
literature. This study is such an attempt and, 
as well, it includes a consolidated report of 
previous research. 


The pros and cons of short form usage are 
not matters of discussion in this paper. Ab- 
breviated forms of the Wechsler-Bellevue are 
in use today, particularly in mobilizing the 
armed services, and it behooves clinical psycho- 
logists to gain as much evidence as possible 
pointing to the best forms to use when examin- 
ing time is limited. 

Table 1 is a consolidated report of informa- 
tion pertinent to previously reported research 
on short form combinations. It includes auth- 
ors and references, reported correlations, and 
an indication of the samples used. Abbrevia- 
tions throughout it and this paper are as fol- 
lows: Information subtest (1), Comprehen- 
sion (C), Arithmetic (A), Digit Span (D), 
Similarities (S), Vocabulary (V), Picture 
Arrangement (PA), Picture Completion 
(PC), Block Design (BD), Object Assembly 
(OA), Digit Symbol (DS). It should be not- 
ed that the Vocabulary used in Hunt’s [5] 
investigation was not the complete Wechsler- 
Bellevue subtest. He substituted, instead, the 
abbreviated 15-item subtest drawn by R. L. 
Thorndike from the Vocabulary of the 1937 
revision of the Stanford-Binet, Form L. 


Reviewed in the Veterans Administration and 
published with the approval of the Chief Medical 
Director. The statements and conclusions published 
by the author are the result of his own study and 
do not necessarily reflect the opinion or policy of 
the Veterans Administration. 


Subjects 


For the current study samples from two 
populations were drawn to try out the twenty- 
one combinations of Table 1. One sample 
may be considered a “normal” group, the othe: 
a group of ‘‘abnormals.” 

The normal group consisted of all the white, 
male veterans of World War II who were ex 
amined with the Wechsler-Bellevue during the 
calendar year 1946 at a Connecticut vocational 
counseling service which had a contract with 
the Veterans Administration. ‘These men, 92 
in number, were receiving government com 
pensation for some non-neuropsychiatric injury 
incurred in or aggravated by military service 
The disabilities ranged from malaria to gun 
shot wounds. ‘They were considered “normal” 
insofar as neuropsychiatric disabilities were 
concerned, since no such impairments were 
discovered during government hospitalization 
and examination. 

The abnormal group consisted of one hun 
dred white, male veterans of World War II 
who had made application for and had re 
ceived treatment at the Veterans Administra- 
tion Mental Hygiene Clinic, Denver, Colorado, 
for various neuropsychiatric disabilities for 
which they were receiving government com- 
pensation. The group was secured by extrac 
ting, in consecutive order, each Wechsler 
Bellevue record from the test files of the clinic 
until one hundred were obtained. Excluded 
from the group were those whose neuropsy 
chiatric diagnosis indicated organic brain 
damage, a psychotic condition, or mental de- 
ficiency. The sample was typical of the clinic’s 
population in terms of age, education, intelli 
gence, and clinical diagnosis. Anxiety rea 
tions constituted approximately 60 per cent of 
the group, conversion reactions about 10 per 
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Short Form 


C-V 
C-A 


V-D 
D-PA 


V-A 
I-BD 
S-PA 
V-PA 
C-A-S 


C-A-PA 


V-C-DS 


S-D-PA 
C-BD-DS 
C-V-§ 
V-I-BD-S 
C-S-D-BD 


I-PC-PA-DS 


V-C-BD-PC 


C-A-BD-DS 
C-A-D-PA-S 


C-A-PC-BD-DS 





Fred H. Herring 


Table 1 


Reported Correlations Between Short Forms and Wechsler-Bellevue 





Investigator 


Correlation Sample 








Hunt [5] .80* 528 Naval recruits considered “normal”’ 

Cotzin [1] 532 154 “high grade” or “borderline” defectives 
Cummings [2] 9337 418 Naval recruits of questionable intellect 
Hunt [5] .87* 528 Naval recruits considered “normal” 
McNemar [9] 833 355 cases, ages 20-34, of Wechsler’s norm group 
Patterson [10] 851 50 Army closed ward psychotics and neurotics 
Hunt [5] .77* §28 Naval recruits considered “normal” 

Cotzin [1] .746 154 “high grade” or “borderline” defectives 
Gurvitz [4] .90 523 male federal prisoners, ages 17-64 

Hunt [5] .82* 528 Naval recruits considered “normal” 
McNemar | 9 ] .770 355 cases, ages 20-34, of Wechsler’s norm group 
Patterson [10] 815 50 Army closed ward psychotics and neurotics 
Hunt [5] 85* 528 Naval recruits considered ‘“normal”’ 
McNemar [9 884 355 cases,ages 20-34, of Wechs!er’s norm group 
Hunt [5] 87* 528 Naval recruits considered ‘‘normal” 

Hunt [5] .80* $28 Naval recruits considered “normal” 

Cotzin | 1] 588 154 “high grade” or “borderline” defectives 
Hunt [5] 91° 528 Naval recruits considered “normal”’ 

Hunt [6] 92 100 Naval recruits 

Kriegman [7 | 861 207 Army “patients”, none psychotic 

McNemar [9] 864 355 cases, ages 20-34, of Wechsler’s norm group 
Patterson | 10] 890 50 Army closed ward psychotics and neurotics 
Rabin [12] .80 92 female student nurses, ages 19-25 

Rabin [12] 956 200 state hospital patients, ages 15-36 

Springer [13] 92 100 Naval recruits suspected of retardation 
Hunt [5] .94* §28 Naval recruits considered “normal” 
Patterson [10] 934 50 Army closed ward psychotics and neurotics 
Patterson [11] 896 100 “more or less norma! males” 

Hunt [5] 94° §28 Naval recruits considered “normal” 
McNemar [9] 912 355 cases, ages 20-34, of Wechsler’s norm group 
Hunt [5] .87* §28 Naval recruits considered “normal”’ 
Kriegman [7] 910 207 Army “patients” none psychotic 

Cotzin [1] 835 154 “high grade” or “borderline” defectives 
Geil [3] .966 250 hospitalized male federal prisoners 
Patterson | 10] .936 50 Army closed ward psychotics and neurotics 
Cotzin [1] .741 154 “high grade” or “borderline” defectives 
Geil [3] 952 250 hospitalized male federal prisoners 
McNemar [9] .909 355 cases, ages 20-34, of Wechsler’s norm grou 
Patterson [10] .948 50 Army closed ward psychotics and neurotics 
Patterson [10] .962 50 Army closed ward psychotics and neurotics 
Patterson [11] 955 100 “more or less norma! males” 

McNemar [9] .932 355 cases, ages 20-34, of Wechsler’s norm group 
Hunt [5] .96 46 high-school students 

McNemar [9] 916 355 cases, ages 20-34, of Wechsler’s norm group 
McNemar [9] 944 355 cases, ages 20-34, of Wechsler’s norm group 





tCummings utilized the verbal scale score as criterion. 





*In those studies indicated by an asterisk Hunt utilized the C-A-D-PA-S short form as criterion instead of the 
full seale Wechsier-Bellevue. 
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cent, and the remainder of the major classi- 
fications in Veterans Administration nomen- 
clature was represented by from 1 to 8 per 
cent of the total. 

The normal group mean age was 26.2, the 
abnormal was 29.0; the difference was statis- 
tically significant at the 1 per cent level. Mean 
education was 9.8 years for normals, 10.9 for 
abnormals, also significant at the 1 per cent 
level. The Wechsler-Bellevue intelligence 
quotients were 108.4 and 112.1, respectively, 
with a statistically significant difference at the 
5 per cent level. Thus, two different popula- 
tions are considered to have been represented 
by the samples. 


Results 


Table 2 presents the results secured with 
the proposed combinations for each sample. 
Pearson product-moment correlations were 
obtained between each set of short form 
weighted scores and the full scale weighted 


Table 2 


Correlation Coefficients Between Full Scale and 
Short Form Resul Differences Between Cor- 
responding z’s, and Averaged Correla- 


tion Coefficients 











Correlation 
For For Differ- Average 
Combination Nor Ab ence Correlation 


mals normals ing Coefficient 





I-BD 80 85 16 83 
V-A 77 84 20 g 

C-A 77 73 10 75 
V-PA 75 86 32 82 
S-PA a 84 25 80 
D-PA 72 69 .06 71 
C-V 69 82 31 ae 
V-D .68 .79 25 .74 
C-A-PA 86 88 .08 87 
C-A-S 84 86 07 85 
C-BD-DS 82 91 37 87 
S-D-PA 81 85 13 83 
V-C-DS 80 89 33 85 
C-V-S 76 85 .26 81 
V-C-BD-PC 91 .92 06 92 
C-A-BD-DS BS .93 40 90 
C-S-D-BD 85 91 .27 89 
I-PC-PA-DS 83 .90 29 87 
V-I-BD-S 83 .90 29 87 
C-A-PC-BD-DS 87 95 50 .92 
C-A-D-PA-S 87 93 32 .90 





scores. In addition, these correlations were 
transformed into Fisher’s z’s [8]. By this 
method, an interpretable difference 
correlations with normals as compared with 


between 


abnormals was obtained. Also, the results for 
the two samples then could be averaged tor 
each combination and reconverted into corre- 
lation coefhcients. 

To select the best short form in each group 
of two, three, four, and five subtests, three 
criteria were used: (a) The size of the aver 
age correlation coefficients was considered 
first. This involved ranking the validities of 
the combinations in terms of their eraged 
orrelation coefficients as though the study had 
been accomplished on one sample comprising 
both normals and abnormals. (4) The second 
criterion utilized the difference found between 
the correlation of a combination when used 
on the normals as compared with the abnorm- 
als. Should the difference be statistically sig 
nificant, it was considered that the margin was 
too great for valid general clinical practice. 
To be significant at the 5 per cent level re 
quired a difference in z of .27, and significance 
at the 1 per cent level required .36. (c) The 
third criterion involved an analysis of the 
functions of the different subtests in terms of 
their usefulness as clinical instruments. First 
preference was to go to the combination which 
utilized a diagnostically loaded balance of 
“hold” versus “don’t hold” subtests based on 
Wechsler’s [14] list. Second preference was 
to go to the combination which included both 
verbal and performance type subtests, 
to sample both types of behavior. 

Utilizing the above-suggested criteria, it is 
found that McNemar’s I-BD is the only two 
subtest short form which utilized not only a 
combination of “hold” and “‘don’t hold” sub- 
tests, sampled both verbal and performance 
types of behavior and, with the highest aver- 
age correlation, had not been rejected by the 
second criterion. 

Hunt’s C-A-PA was selected in the three 
subtest group since it had the highest com- 
posite correlation of those not rejected by the 
second criterion and it combined both “‘hold” 
and “‘don’t hold” subtests, plus verbal and per- 
formance measures. 


Patterson’s V-C-BD-PC was best of the 
four-subtest group after C-A-BD-DS was 


n order 
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eliminated because of a significant difference, 
at the 1 per cent level, between normals and 
abnormals. V-C-BD-PC also contains “hold” 
and ‘“‘don’t hold” subtests, plus verbal and per- 
formance subtests in equal numbers. 


‘The five-subtest results indicate that while 
McNemar’s C-A-PC-BD-DS is slightly better 
than Hunt’s C-A-D-PA-S in terms of the first 
criterion; the validity of the former is sig- 
nificantly different between the samples at the 
1 per cent level, while the latter’s validity is 
different at the 5 per cent level. However, it 
should be noted that both have correlations 
within the top four of the twenty-one com- 
binations. The third criterion does make a 
distinction. While both forms contain 
both “hold” and “don’t hold” subtests plus 
verbal and performance measures, McNemar’s 
C-A-PC-BD-DS includes two “hold” subtests 
while C-A-D-PA-S includes only one; and the 
former comprises two verbal measures, while 


clear 


the latter has four but only one performance 
type. Accordingly, it was considered that Mc- 
Nemar’s C-A-PC-BD-DS was the better bal- 
anced of the two in terms of potential clinical 
usefulness. 


Summary 


To investigate the relative validities of pub- 
lished short forms of the Wechsler-Bellevue 
Intelligence Scale, two samples comprising one 
hundred ninety-two white, male war veterans 
were secured. One was a group of ninety-two 
“normals” who, during government hospitali- 
zation, revealed no neuropsychiatric disabil- 
ities. The other was a group of one hundred 
“abnormals,” patients in psychotherapy at the 
Denver Mental Hygiene Clinic of the Veter- 
ans Administration. 


Correlations between the weighted scores of 
each of the twenty-one short forms and the 
weighted scores of the full scale were com- 
puted for each of the samples. These corre- 
lations were transformed into Fisher’s z’s in 
order to compute the differences and the aver- 
ages for the combinations. 

The criteria for selection of the best forms 
were (a) highest correlation combining nor- 
mal and abnormal samples, (4) lack of sig- 
nificant difference between correlation with 
normals as compared with abnormals and (c) 


inclusion of clinically useful combinations, or 
a combination of verbal and performance sub- 
tests. 

The results obtained in this study as com- 
pared with the results of previous investiga- 
tors with different populations (Table 1) sug- 
gest that further studies should be carried out 
whenever testing must be done on groups not 
comparable with earlier ones. In addition to 
extending the dimensions of the groups to be 
investigated in terms of whatever differences 
exist, the use of larger samples should present 
more valid results by reason of their size. The 
following conclusions are considered valid 
within the limits of the populations sampled. 

The best short form combining two subtests 
of the eight examined was found to be Mc- 
Nemar’s Information and Block Design 
(I-BD).The best three-subtest form of the six 
in that group was Hunt’s Comprehension, 
Arithmetic, and Picture Arrangement (C-A- 
PA). Patterson’s Vocabulary, Comprehension, 
Block Design, and Picture Completion (V-C- 
BD-PC) appeared to be the best of the five 
four-subtest forms. McNemar’s Comprehen- 
sion, Arithmetic, Picture Completion, Block 
Design, and Digit Symbol (C-A-PC-BD-DS) 
was found to be the better of the two five- 
subtest short forms. 

Choice between these forms appears to be 
dependent upon the amount of time available 
to the user of such an instrument. The short 
forms increased in validity as they increased in 
size, the five-subtest being best. This followed 
logically since the short forms should get “‘bet- 
ter” as they approach in size the full scale 
Wechsler-Bellevue. Thus, the evidence points 
to the advisability of using as many subtests as 
possible in the combinations listed in the pre- 
ceding paragraph. 


Received May 24, 1951. 
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Some Defects of the Wechsler-Bellevue 


Milton 5. Gurvitz 
Hillside Hospital and Adelphi College 


The almost universal use of the Wechsler- 
Bellevue Scale for measuring the intelligence 
of adults needs no documentation. Certain 
limitations and shortcomings of the Scale, al- 
though partially reported in some early re- 
views, are not nearly so widely recognized. 
These defects are especially relevant to the 
use of the Scale for research purposes, and for 
clinical diagnosis by intersubtest comparisons. 


The Standardization Population 


The standardization population has been 
justly criticized because it lacks a proper range 
both in geographical distribution and in oc- 
cupational spread. It is well known that the 
sample was drawn almost entirely from the 
New York City area. Also, more than reason- 
able doubt exists of the procedure which sub- 
stituted “barbers, bakers, and teamsters’’ for 
farmers, and “similarly replaced” other miss- 
ing occupational representatives [7, p. 109]. 

One even more important defect of the 
standardization population appears to have 
been overlooked. In applying the criteria of 
occupational level to women, who constituted 
half of the standardization group, no regard 
was paid to the fact that most women in all 
age groups are not gainfully employed. Indeed, 
when we examine the employment figures for 
women older than thirty, we find 70 to 90 
per cent of them in the “not gainfully em- 
ployed” category [7, p. 111]. As a result, there 
is little discrimination possible among them, 
and no way by which to compare the goodness 
of sampling to national standards. At the time 
of the standardization of the Scale, Wechsler 
could not have used the educational level of 
adults as a criterion, because such data became 
available on a national basis only with the 
1940 census. Any future standardization must 
use educational level as a sampling criterion, 
especially for unemployed women. The sam- 


pling uncertainties of the 1937 Scale vitiate 
Wechsler’s generalization that women are 
more intelligent than men. 


Standardization Procedures 


[he standardization of the Scale needs to 
be reviewed critically. One striking shortcom- 
ing is that the Similarities subtest seems to 
have been given to only one-third of all stand- 
irdization subjects. All total score and IQ 


trebles are therefore based on a mixture of one- 


third of the cases with ten tests, and two- 
thirds of the cases with nine tests prorated [7, 
pp. 85, 223]. 


The smoothing of the standard deviations 
of 1Q’s involved violence to the data to make 
them fit a preconceived hypothesis of a linear 
increase of the SD from age 20 to age 60. The 
discrepancies can be noted by comparing two 
columns of Wechsler’s Table 14 [7, p. 118]. 

The most important doubts about standardi- 
zation arise from comparing Wechsler’s Ta- 


Table 1 

Variations in Number of Cases and Mean Scores in 
Two Samples Used in the Wechsler 

Bellevue Standardization 


Sample in 


Standardizatio 


W echsler’s Tables 





Sample* 39 and 40+ 
4 roup Sum of 
N Mear N Means 
20-24 160 98.8 120 103.1 
25-29 195 95.9 125 100.8 
34 140 90.4 110 94.5 
5-39 135 86.7 100 91.4 
10-44 91 85.1 75 90.1 
15.49 70 79.0 60 85.1 
50-54 $5 77.4 45 82.6 
55-59 5¢ 74.9 36 75.3 
All ages 89¢ 671 


*Wechsler [7, pp. 103, 118]. 
tWechsler [7, p. 222]. 
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bles 39, 40, and 41 [7, pp. 222, 223] with 
the description of the standardization sample 
given in his Table 7 [7, p. 103]. Tables 39 
and 40 purport to give the mean and SD for 
each age group on each subtest, and have been 
widely used for experimental comparisons, the 
construction of z-scores, and clinical work. 
The values given do not reflect the entire orig- 
inal standardization population, however, but 
only about 75 per cent of it. His Table 7 
shows 896 cases aged 20 to 59, whereas Ta- 
bles 39 and 40 are based on 671 cases in that 
age range. There is some loss of cases in every 
ave interval. 

How was the reduced sample selected ? Data 
from Wechsler, summarized in Table 1, show 
that the samples of his Tables 39 and 40 are 
biased by the omission of persons of lower in- 
telligence. Adding the subtest means of Tables 
39 and 40 yields substantially higher means 
for most ages than are given in his Table 14 
for the entire standardization group. 

The discrepancies in the means and Ns are 
of some importance for research. Levi [3] and 
Olch [5], for example, used portions of these 
tables for subtest norms to compare with 
pathological groups. The application of the ¢ 
technique to test the significance of differences 
between outside data and these “norms” is 
doubtful, to say the least. Similarly, Barnett 
[2] and Alimena [1] used Wechsler’s Tables 
39 and 40 to construct z-scores, which must 
be regarded as dubious. 

The intercorrelations between subtests given 
in Wechsler’s Table 41 [7, p. 223] are based 
355 cases in the age range of 20 to 34. It 
is clear that this table is not based on the en- 
tire original sample of 495 cases in these ages, 
but on the biased group used for Tables 39 
and 40, which similarly totals 355 cases in ages 
20 to 34. Furthermore, the intercorrelations 
of the Similarities subtest are based on only 
150 cases, aged 15 to 49, and the valuable 
Vocabulary subtest has no correlational data. 
These shortcomings seriously limit the value 
of a study such as that of McNemar [4] who 
evaluated the efficiency of short forms of the 
Wechsler by using the intercorrelations of 
Table 41. 

Another defect of the Wechsler manual is 
that it makes a change in procedure without 
calling attention to the change. In the second 


on 


edition, Wechsler [6] presented formulas for 
obtaining 1Q’s for ages beyond those given in 
the tables. The third edition, as first pub- 
lished, contained the identical formulas [7, 
p. 225]. However, in newer printings of the 
third edition, at some time between 1945 and 
1947, these formulas were modified without 
even a footnote to indicate to the clinician or 
researcher that a change had been made. By 
comparing a 1945 printing with a 1947 print- 
ing, it may be seen that none of the constants 
in the formulas is the same, and that the vari- 
ation ranges from 25 to 40 per cent. No ex 
planation has yet been given. 


Variations in Administration and Scoring 


The materials, procedures, and scoring 
standards of the subtests have been subjected 
to variation from time to time. Although 
Wechsler stated, “In administering the tests 
it is absolutely essential that the examiner fol 
low the directions as given” [7, p. 171], he 
has not always taken his own advice. A com 
parison of the second and third editions of 
The Measurement of Adult Intelligence re- 
veals changes in the administration and scor 
ing of some subtests without corresponding 
changes in the tables of IQ’s. The Picture 
Arrangement, Object Assembly, and Picture 
Completion subtests are affected. 

In Picture Completion, for example, the se 
ond edition directions state “Allow a maxi- 
mum exposure of 15” per picture. If the sub 
ject does not indicate the missing part within 
this time, score as a failure and continue with 
the succeeding picture” [6, p. 172]. In the 
third edition, the timing instruction is changed 
to “Allow a maximum exposure of 15” to 20” 
per picture’ [7, p. 178]. The arbitrary in 
crease of 33 per cent in exposure time was not 
accompanied by any changes in the weighted 
scores. 

An important difference in scoring is found 
in the Object Assembly subtest. The third 
edition changed the scoring of the “head” 
item. Previously, if an examinee reversed the 
“ear pieces,” he lost two points and thus for- 
feited any possible time credits. The third edi 
tion prescribes a one-point loss for the rever 
sal of the “ear pieces’ and thus leaves the 
examinee eligible for a time bonus. In the 
writer's experience, this change has led to gross 
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differences in the final score of the Object 
Assembly, particularly with individuals whose 
IQ is above 110. 

The chief result of such changes is to invali- 
date any comparison of examinations carried 
out with the original procedures and scorings, 
and those done with the revised scoring. Stud- 
ies which used the second edition are not al- 
ways directly comparable with studies using 
the third edition, especially with respect to pat- 
tern analysis, which is highly susceptible to 
changes resulting from variations in a few sub- 
tests. The 31 research studies done before 1944 
may have many differences in scoring and in 
use of materials as compared with current 
practices. 


Conclusions 


Although the Wechsler-Bellevue retains its 
adequacy as a gross measure of general intelli- 
gence, its use as a diagnostic instrument and 
its application to research are seriously handi- 
capped by irregularities in standardization and 
scoring. Studies using the first or second edi- 
tion of The Measurement of Adult Intelli- 
gence are likely not to be comparable to stud- 
ies which used the third edition, especially with 
respect to the Arithmetic, Picture Arrange- 
ment, Picture Completion, and Object Assem- 
bly subtests. 

Even more serious has been the uncritical 
use of certain tables as standards for com- 
parison and bases for research. The means and 
SD’S of the subtests, and the intercorrelations 
of the subtests, are given for a biased sample 
whose intelligence level is higher than that of 
the original standardization group. These ta- 
bles have been accepted uncritically by doc- 
toral candidates, inexperienced researchers, and 
seasoned veteran investigators alike. 





Gurvilz 


In criticizing the procedures and statistics 
of the Wechsler-Bellevue Scale, due considera- 
tion must be given to the conditions under 
which the test was constructed. Presumably, 
Wechsler had limited research resources and 
staff, yet performed a reasonably adequate 
standardization for his originally intended 
purposes—an adult intelligence test. The criti- 
cal comments are directed mainly to newer 
developments in the use of the Scale for clini- 
cal study and for research, developments that 
were not foreseen when the tests were first 
devised. 

‘The immediate practical conclusion is that 
the extensive use of Wechsler’s detailed tables 
for research purposes should be discontinued 
until a new, thorough, and statistically sophis- 
ticated revision and restandardization of the 
Scale is accomplished. 


Received June 6, 1951 
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Forms I and II of the Arthur Performance Scales 
with Mental Defectives’ 
Saul W. Gellerman 


28th General Hospital, APO 743, c/o Postmaster, New York, N. Y 


[he present study is designed to help solve 
a common clinical problem. Psychologists some 
times find that the alternate form of an intel- 
ligence test, when used with a given clinical 
population, yields scores which are markedly 
discrepant from the scores of the original form. 
This study is concerned with the clinically ob- 
served tendency for the results of Revised 
Form I! of the Arthur Performance Scales to 
be noticeably lower than the results obtained 
on Form I, when the subjects are mental de- 
fectives. 


Procedure 


Characteristics of the instruments. The sub- 
tests comprising the two forms of the Arthur 
scales are listed below. Note that four of the 
subtests in Form I occur in revised or alter- 
nate form in Form II. The tests are not listed 
n order of administration. 


Form 1 
Knox Cube Test 
Seguin Form Board 
Healy Picture Completion Test 
Porteus Maze Test 
Casuist Form Board 
Mannikin Assembly Test 
Mare and Foal Test 
Kohs Block Design Test 


Form II 
Knox Cube Test 
Seguin Form Board 
Healy Picture Completion Test 
Porteus Maze Test 
Arthur Stencil Design Test 


The alternate forms of the Knox, Healy, 


1From the Psychological Laboratories of Lincoln 
State School and Colony, William W. Fox, M.D., 
Superintendent. The writer wishes to express his 
gratitude to Dr. William Sloan, Supervising Psy- 
chologist, and Mr. William Hays, Psychologist II, 


for their advice and assistance. 


and Porteus. tests differ primarily in that 
whereas verbal instructions are given in Form 
I, the Form II versions of these subtests were 
standardized with pantomime instructions. In 
struction for the Sevuin test are ' rbal tor 
both forms. 

The Arthur ules are scored by the point 
system. Direct comparison of the point score 
for a given test with the point score for an 
other test is not possible, since the points are 
weighted so as to increase the influence of the 
more discriminative subtests in determining 
the final score. The only method for compari 
son of performance on different subtests is to 

se the rough Mental Age equivalents which 
Arthur provides along with her point tables 
for each subtest. Admittedly a certain error is 
thereby introduced, since the MA units do not 
represent the level of performance as accurate- 
ly as do the points. Much of this error presum 
ably would be obviated through randomiza 
tion in continuous scoring. In the absence of 
another unit of comparison, the raw data for 
this experiment were these MA units, inter 
polated for greater accuracy. 

Subjects. All subjects for this experiment 
were committed mental defectives at the 
Lincoln State School and Colony. They were 
divided into two experimental groups. Sub- 
jects were selected by the matched-pair method 
to insure minimal group differences. The fac 
tors controlled in selection were sex, chrono 
logical age, mental age, previous experience 
with the Arthur scales, and etiological classi- 
fication. 

The technique of selection was as follows: 
all patients who had been tested with Form 
I of the Arthur scales in the period between 
December 1, 1948 and June 1, 1949, but who 
had not been tested subsequently with the 
Arthur scales, were selected from the Psy- 


127 








128 Saul W 
chology Department’s daily reports for that 
period. The factor of previous experience with 
the instruments was thus controlled by limit- 
ing the experimental group to subjects whose 
most recent experience with the scales occurred 
during a six-month period at least one year 
prior to the experimental testing sessions. An 
interval of a year should normally obviate any 
practice effects. 

From the list of patients thus selected, two 
groups of twenty subjects each were chosen. 
Each group contained ten males and ten fe 
males. Each pair was matched as closely as 
possible for chronological age, mental age, 
(based on the MA attained on the Form I 
given a year previously), It was not 
always possible to match pairs for etiological 
classification, but the groups themselves did 
not differ importantly with regard to this char- 
acteristic. There were thus set up two equated 
groups whose characteristics are summarized 


in Table 1. 


and sex. 


Table 1 


Characteristics of the Experimental Groups 


Controlled Factor Group I 


Group II 





I. Chronological Age 


Mean SD Mean SD 
(in months) 
Males 201 66 171 37 
Females 291 148 309 150 
Total 246 122 240 134 
II. Mental Age 
(in months) 
Males 96 16 94 22 
Females 90 9 88 2 
Total 93 13 91 1% 
III. Etiological Type 
Familial 10 7 
Organic* aS 5 
Undifferentiatedt 4 5 
No Classification 2 
Total 20 20 





*Organic included post-traumatic, post-infectional, en- 
docrine dysfunction, and epilepsy. 

+Undifferentiated means 
for differentia) diagnosis. 


insufficient etiological data 


Note that while intergroup differences are 
not important, there is a systematic intragroup 
inconsistency with regard to CA. This is a 
function of emphasis in selection. Subjects 


Geller 


were selected primarily for equivalence in MA. 
For subjects under 15, careful CA matching 


was also made. For subjects over 15, for 
whom equivalent MA's would presumably 
represent equivalent levels of functioning, 


matching for CA was not so carefully ex- 
ecuted. Hence, the IQ distributions for each 
group may be presumed to be equivalent. The 
t-test was performed for differences in CA and 
MA between the groups, and in neither case 
was a significant ratio obtained. 

Design. The study is designed to permit 
the evaluation of the following statistical hy- 
(a) The final scores attained by 
mental defectives on both forms of the Arthur 
scales will not differ significantly. (6)If the 


, 
potheses: 


wo forms differ significantly, the discrep- 
ancy: (7) is not due to differences in instruc- 
tions, 1.e., verbal and pantomime; (ii) is not 


due to differences in the standardization norms 
for the two forms: (iz) is not due to the 
] 


dificulty of the test items. 


and 


These hypotheses are well suited for evalua- 
tion by means of the techniques of analysis of 
variance and partial correlation. Analysis of 
variance may be used in this case to estimate 
the significance of the discrepancy between 
results for the forms as a whole and for the 
comparable subtests, while the influence of 
extraneous factors, (individuals, order of ad- 
ministration) on the discrepancy also may be 
evaluated. Partial correlation may be used in 
this case to determine whether any subtest or 
combination of subtests acts to increase the 
over-all discrepancy between the forms. 

To meet the conditions of analysis of vari- 
ince, the experiment was designed to contain 
two groups of mental defectives, equated for 
important characteristics, to which both forms 
of the test would be administered. To rule 
out “practice effect’ in the final results for 
both groups as a whole, the order of adminis- 
tration for one group was the reverse of the 
order used for the other. In Group I the order 
was Form I — Form II, while in Group II 
the order was Form II — Form I. The in- 
terval between the two administrations for 
each individual varied between four and five 
weeks. All tests were administered by one 
examiner (the writer), in strict accordance 
with the instructions given by Arthur in her 


manuals [1, 2]. 
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The F-tests were designed after the manner 
described by Grant [3]. Each is essentially 
a two-by-two analysis of variance in which 
columns represent order of administration, 
rows represent session, and interaction repre- 
sents the instruments themselves, as_illus- 
trated: 

Session 
Arthur I Arthur II 
Order 


Arthur II Arthur I 


Results 


The results of the experimental testing ses- 
sions are summarized in Table 2. The first 
analysis was performed with final MA scores 
as data. The F-ratio for interaction was sig- 
nificant beyond the 1 per cent level of confi- 
dence. The F-ratios for order and session were 
not significant. Mean MA for Form II was 
lower than mean MA for Form I. 


Table 2 


Mental Age Scores Attained by Forty Mental De 
fectives on Subtests of Forms I and II of the 
Arthur Performance Scales 





(Data in Years) 








Suites Form I Form II 
Mean SD Mean SD 

Knox 6.57 1.79 7.88 3.19 

Seguin 9.58 2.22 8.21 4.06 

Healy 7.59 2.20 5.03 0.79 

Porteus 7.01 1.77 7.00 2.49 

Stencil 5.99 0.78 

Casuist 9.06 3.17 

Mannikin 8.71 4.97 

Mare and Foal 10.11 3.27 

Kohs 6.62 1.06 

Final MA 8.11 4.17 6.47 3.20 








Analyses of variance were performed in the 
same manner for each of the four comparable 
subtests. In the case of the Porteus Maze 
Test, there was no significant difference be- 
tween the results for the two forms. The 
Form I version of the Healy and Seguin tests 
were each significantly higher than their Form 
II counterparts, with extraneous factors being 
insignificant in both cases. The Form II ver- 
sion of the Knox Cube Test was significantly 
higher than its Form I counterpart, with ex- 
traneous factors again insignificant. 


In the statistical tests that follow, the Form 
1 MA is accepted as the criterion, or pre- 
ferred estimate of performance ability. Form 
I is an accepted instrument for evaluating non- 
verbal ability in children and subjects for 
whom adult performance tests are too difficult. 
The results of Form I are found to agree more 
closely with the results of similar tests on 
mental defectives than do the results of Form 
Il. For purposes of this study Form I is re- 
garded as the criterion against which Form 
II will be compared. The purpose of the pro- 
cedures that follow is to assess the possibility 
that certain Form II subtests contribute more 
than others to the over-all discrepancy between 
the forms. 


The product-moment correlation between 
the final MA scores for the two forms was 
+ .136. This is not a significant correlation. In 
Table 3 are presented the product-moment cor- 
relations between the Form I MA and each 


Form II subtest MA. 











Table 3 
Product-Moment Correlations with Form I MA 
(N = 40) 
Form II Subtest a 
Knox Cube Test +-.176 
Seguin Form Board + .300 
Healy Picture Completion Test + .004 
Porteus Maze Test + .360° 
Stencil Design Test 102 





*Significant at 5 per cent level of confidence 


Partial correlations were then calculated, in 
which the relationship between Form I and 
Form II was estimated with the effects of in- 
dividual subtests, or combinations of subtests, 
partialled out. It was found that partialling 
out the effects of any single subtest did not 
produce appreciable changes in the Form | 
Form II relationship. However, partialling 
out the effects of pairs of Form II subtests 
produce significant changes. When the effects 
of the Knox and Stencil tests were partialled 
out, the Form I—Form II correlation increased 
to + .398. Partialling out the Healy and 
Stencil tests raised the correlation to + .257. 
On the other hand, partialling out the effects 
of the Seguin and Porteus tests lowers the 
Form I-Form II correlation to + .065. The 
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highest correlation between Form I and Form 
II is obtained by partialling out the combined 
effects of the Knox, Stencil, and Healy tests. 
This raises the correlation to + .45. 


Discussion 


The implication of the results of the first 
F-test is that the final results of Form II are 
significantly lower than the results for Form 
I, and that this difference is due to differences 
between the forms themselves, rather than to 
extraneous factors. Since the Form I results 
are the preferred estimates of performance 
ability, this implies that Form II does not 
yield as valid an estimate of 
ability in mental defectives. 


performance 


The results of the F-tests for the four 
“comparable” subtests suggests a serious lack 
of comparability between presumably equiva- 
lent instruments. The possibility that differ- 
ences in instructions accounts for the difference 
is ruled out, since this would presumably effect 
a uniform raising or lowering of results among 
all four subtests. There is no effect at all on 
the Porteus test, and the Knox results are 
diametrically opposite to the Healy and Seguin 
results. We must assume that subjects tended 
to achieve at about the same level of difficulty 
on both tests. Accordingly, the inference may 
be drawn that the scoring norms differ in the 
value assigned to equivalent levels of per 
formance. Such an inference is particularly 
supported by the results of the analysis of the 
Seguin scores. In this test the tasks and in 
structions are identical for both forms, al- 
though the test materials are smaller on Form 
II. The significant difference cannot be at- 
tributed to differences in instruction or diff- 
culty, and apparently follow from normative 
differences. 

The F-tests thus demonstrate a significant 
difference between the forms, and rule out the 
possibility that the difference may be due to 
order of administration, individuals, or differ- 
ences in instruction. 

The correlational procedures demonstrate 
that some of the discrepancy between the 
forms is a function of the effects of certain of 
the Form IT subtests. The fact that these cor- 
relations are as low as they are is no doubt a 
function of the small N and the restricted 
population from which the sample was drawn, 


Gellern nN 


as well as any intrinsic differences between the 
tests or their norms. 


In Table 3 we find that only two subtests, 
the Seguin and the Porteus, are significantly 
This relation- 
ship also obtains in the partial correlations, 
i.e., removal of the subtests from the battery 
decreases the Form I—Form II correlation 
whereas removal of the three other subtests 


correlated with the criterion. 


increases the correlation between the forms. 
Specifically, the Knox test and the Stencil De- 
sign test seem to contribute to the discrepancy 
between forms, and the Healy test seems to 
have no appreciable effect on the over-all cor- 
relation. Exclusion of these three subtests 
raises the Form I—Form II correlation to its 
highest level. It would seem to follow that 


the significant difference between the results 


of the two forms is at least partially due to 
the contribution of these three subtests to the 
final Form II MA. 

The negative relation between the Stencil 
test and the criterion is apparently a function 
of the difficulty of the test itself. The mean 
MA for this subtest was 5.99, only 0.49 above 
the minimum MA measured by the test, and 


’ 
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the standard deviation for this subtest was 
0.78. This test became too difficult for nearly 
all mental defectives at about the six-year 
level. Except for subjects whose Form I MA’s 
were below six, the Stencil test does not dis- 
in mental 
defectives. The Stencil Design test in its pres- 
ent form is standardized at too high a level for 
mental defectives, and is of no value in esti- 
mating their ability. This is not to question 
the validity of the Stencil Design tests. 
Studies such as that by Magsdick [4] have 
shown the test to be significantly correlated 
with criteria such as the Wechsler-Bellevue 
Scale when used with high-school students. 
However, the inclusion of this test in a 
battery for mental defectives is definitely not 
indicated. 

The same criticism may be made of Form 
II of the Healy test. The mean MA for this 
test was 5.03, 0.53 above the minimal MA 
measured by the test, and the standard devia- 
tion was 0.79. The mean for Form I of this 
subtest, on the other hand, was 7.59, with a 
standard deviation of 2.2. Arthur points out 
in her manual [2] that Form ITI of the Healy 


criminate between levels of ability 
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test is not valid for use with rural children, 
since it requires a knowledge of urban culture. 
The subjects for this study were not primarily 
rural or urban. Presumably, however, the 
mental defective is unlikely to absorb the more 
complex aspects of any culture. ‘This subtest 
in its present form is unsuited for accurate 
discrimination among mental defectives, re- 
gardless of geographic origin. 

The mean MA for Form I of the Knox 
Cube test was 6.57, with a standard deviation 
of 1.79. The mean MA for Form II of this 
subtest was 7.88 with a standard deviation of 
3.19. The discrepancy in dispersion of scores 
for the two distributions reflects the compara 
tive unreliability for one of the forms as a dis- 
criminative instrument. Since Form I of this 
subtest correlates significantly with the criteri- 
on (+ .302), while the Form II version does 
not (+ .176), the reliability of the Form II 
version may be questioned. 


he statistical analyses thus suggest that the 


discrepancy between the two forms can be due 
to differences in the scoring norms between 
comparable subtests, and to the effects of the 
Knox, Stencil, and Healy results ‘upon the final 
Form II MA. Undoubtedly, the latter factor 
is to a considerable extent a function of the 
former. It would seem that the primary 
reason for the inadequacy of Form II in ac- 
curately estimating levels of ability in mental 
defectives lies in the standardization data for 
the form. 


Arthur raises the point [2] that the Form 
IT norms may be standardized at a higher level 
than the levels of ability they purport to 
measure. This suspicion is certainly borne out 
in the case of mental defectives. Not only are 
parts of Form II too difficult for adequate 
differentiation among persons of inferior abili- 
tv, but the Form IT norms give lower credit 
than do the Form I norms for presumably 
equivalent performances. 


It is important to bear in mind that these 
findings should in no way prejudice the use of 
Form II with nondefective subjects. The re- 
sults of this study are applicable to the de- 
fective group alone. When a suspected mental 
defective is being dealt with, however, it is 
suggested that Form II of the Arthur scales 
is not an accurate instrument for the estima- 


tion of the subject’s capacity. No doubt a 
careful restandardization of the Form II 
norms, including an adequate sampling of the 
defective range, would increase the usefulness 
of the instrument with this clinical group. 


Summary and Conclusions 


1. Two groups of twenty mental defec 
tives, equated for important characteristics, 
were administered Forms I and II of the 
Arthur Performance scales. 


2. Forms I and II of the Arthur Perform 
ance scales are found to yield significantly dis 
crepant Mental Age scores when used with 
mental defectives. The Form I MA is to be 
preferred as the more accurate estimate of the 
two. 

3. The discrepancy seems to result from 

the operation of two factors (which are not 
mutually exclusive). They are (a) differences 
in the scoring norms for comparable subtests 
and (6) the excessive difficulty or unreliability 
of certain of the Form II subtests. 
4. The Stencil Design test and the Healy 
Picture Completion test (Form II) are too 
dificult for accurate estimation of levels of 
ability with mentally inferior subjects. Form 
II of the Knox Cube test yields unreliable 
results when used with mental defectives. 

5. Form II of the Arthur Performance 
scales, in its present form, is not an accurate 
instrument for the estimation of levels of 
ability with mentally inferior subjects. 

6. This study suggests that the usefulness 
of Form II with mental defectives could be 
increased by a restandardization including a 
more adequate sampling of the defective range. 


Received May 2, 1951. 
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Reaction Time Characteristics of the 
Rorschach Test 


Joseph D. Matarazzo and Ivan N. Mensh 


Washington University School of Medicine 


Although much has been written about the 
clinical, diagnostic usefulness of the time vari- 
able in the Rorschach test [1, 2, 6, 10], little 
has been done in the way of systematic in- 
vestigation. However, recent studies have ex- 
amined reaction time as a significant test vari- 
able. These include studies by Siipola [11], 
and by Dubrovner and his co-workers [5], on 
the effects of color in Rorschach behavior as 
reflected in reaction time. 


In these studies the experimental design pro- 
vided for presentation of two versions of the 
same Rorschach stimuli. In Siipola’s study 
there were 20 discrete units which were “‘liter- 
ally cut out” of the five colored cards of the 
standard Rorschach series. The second study, 
by Dubrovner et al., used the ten standard 
Rorschach cards. Both studies employed a 
colored and an achromatic version (photo- 
graphs of the colored sets) of the stimulus 
blots. Dubrovner and his colleagues found 
that “there is no confirmation for the assump- 
tion that color affects reaction time.” Siipola, 
on the other hand, used only particular areas 
instead of the whole blot and found that “the 
presence of hue in the stimulus has the general 
effect of increasing the time taken by the sub- 
jects to give their first conceptual responses.” 
The conclusions of these experiments are not 
necessarily contradictory, however, since 
Siipola demonstrates that apparently it is not 
the presence of color per se which accounts for 
the increase in reaction times to the blots but, 
rather, the presence of color which is incon- 
gruent with the particular shape of the blot; 
i.e., a “pink-colored” shape suggesting a “bear” 
would yield longer reaction times than the 
same shape either in achromatic color or brown 
color. Both studies have added to our knowl- 
edge regarding the effects of hue on reaction 
time. 


I 


wad 


Another recent study, more closely related 
to the present one, is that of Beck et al. [3]. 
From a sample of 157 normal subjects these 
investigators were able to derive group aver- 
ages on some 30 Rorschach variables. These 
data afford norms against which to compare 
statistics on other groups or against which to 
evaluate the findings on an individual. One 
of the variables analyzed in Beck’s study was 
reaction time. In the present study, reaction 
time to a card is defined as the period of time 
elapsing between the examiner’s presentation 
of the card to the subject and the first scorable 
response given by the subject to this card. A 
scorable response as used here has been defined 
by Rapaport [9, p. 128] as a response which 
has “a definite area, a definite content, a de- 
terminant, and in most cases a form level.” 

Our study was designed to test two hy- 
potheses about reaction time in the Rorschach 
test: (a) different clinical groups behave 
similarly, in terms of RT to the same card 
(intergroup comparisons), i.e., the ten cards 
have a unique stimulus value for RT and this 
value will be the same regardless of the kind 
of general adjustment of the subject respond- 
ing to the individual cards; and (4) the ten 
cards are similar in stimulus value with respect 
to RT for the members within a diagnostic 
group (intragroup differences), i.e., for any 
one group, the individual cards are of such 
stimulus nature that they elicit similar (in a 
statistical sense) times per first response. 
Corollary studies of the effect of sex and age 
on RT also were included because of their 
possible relevance. 


Procedure 


The sample studied was a total sample dur- 
ing a two-year period, consisting of 201 pa- 
tients given psychological examinations in the 
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Table 1 


Neurotic Psychotic 


Characteristics of the Samples 


Organic Probability Level 

(N = 100) (iy == 7S) (i == 27) of t Ratios* 

Mean SD Mean SD Mean SD PN-PS PN-OR PS-OR 
Age 28.89 10.03 27.57 10.83 37.78 14.64 01 01 
10 (W-B 108.25 16.35 102.54 19.60 93.38 16.93 01 05 
Education 12.44 3.40 12.58 3.24 10.75 3.98 U5 
R 33.03 21.06 48.72 48.06 24.30 23.07 05 O01 
T/R 48.39 31.01 47.34 436.89 67.33 43.16 0 0 
T/iR 20.38 16.80 19.47 18.38 25.96 21.29 
Total time 18.73 15.86 31.14 31.14 21.06 14.83 01 05 

“¢Only probability values at the 1 per cent or 5 per cent levels are reported 


neuropsychiatric service of a midwestern inpa- 
tient hospital. There were 100 psychoneurotic, 
74 psychotic, and 27 organic patients. ‘These 
three groups will be referred to in the tables 
as PN, PS, and OR, respectively. Only those 
patients were used in the study on whom there 
was agreement between the psychiatric and 
psychological staff on the major nature of the 
patient’s illness. In addition to comparison of 
the groups for mean age, IQ, education, R 
(total number of responses), 7'/1R, (average 
time per first response), T7/R (average time 
per response), and 7’ (total. time); the data 
on reaction time were analyzed by card (intra- 
and intergroup comparisons), age, and sex, 
The mean reaction times per card of the three 
diagnostic groups also were compared with the 
recently published data (Beck et al. [3]) on 


157 normals. 


Results 


Table 1 presents the characteristics of the 
three samples. The values of T7/1R, T/R, and 
total time were obtained from the individual 
protocol summaries. 7'/7R is derived from the 
individual protocols, therefore, and not from 
the summation of reaction time per card for 
all individuals within a group and dividing by 
10. For this reason the mean values of T'/1R 
for any one group as given in Table 1 will not 
exactly equal the mean of the reaction times 
for all ten cards as might be computed from 
Table 2. The differences are slight ones and 
result from “rounding-off” errors in computing 
T/1R for any one individual when making up 
the summary table. 

It can be seen from Table 1 that the three 


groups differ in all variables except 7T'//R. 
Average time per first response, therefore, did 
not differentiate the groups. Comparison of 
the groups reveals that the organics are 7.89 
and 10.21 years older than the neurotics and 
psychotics, respectively. These differences are 
significant at the 1 per cent level of confidence. 
Probability values are given only when the dif- 
ferences are significant at the 5 per cent or 1 
per cent level of confidence. Further, the or- 
ganic group does not earn as high a mean IQ 
as do the other 2 groups. This may be the re- 
sult, at least in part, of lowered functioning 
due to the cerebral pathology of the former 
group. Ihe neurotic and psychotic groups also 
have more education. These significant differ 
ences among the three samples on age, IQ, and 
education variables suggest that comparisons of 
these groups on other variables may be ques- 
tionable. However, the following discussion 
will attempt to point out that these differences 
among the three groups apparently have little 
influence on reaction time to the individual 
Rorschach cards. This is especially true for 
the age variable. 

The statistically significant reduction in R 
in the organic group confirms the clinical 
“rule of thumb” that organic patients are less 
productive than are psychotics. The difference 
between the organics and neurotics while in 
the expected direction is not statistically sig- 
nificant. The significant difference of 15.69 
in number of responses between the psychotic 
and neurotic groups also is not inconsistent 
with clinical expectation. The longer average 
time per response, 7'/R, of the organic patients 
as compared to the two other groups again is 





134 


in keeping with the clinically-held hypotheses 
of the “dulled” characteristic of this group. 
The variable of total time, 7, necessary for 
completing the free-association to the ten cards 
may not be useful since it reflects directly the 
other three variables, R, T'/1R,and 7'/R, just 
discussed. In a glance, however, it does reveal 
the greater length of time necessary for ad- 
ministration of the Rorschach experiment to 
psychotics, and also their greater variability 
in this regard, as compared to either the 
neurotic or organic group. 


Table 1 that, 
compared to the neurotic and psychotic groups 
on the Rorschach variables studied, the or- 
ganics are less productive and are slower in the 
average time per response. The psychotics, on 
the other hand, give more responses, take a 
longer total time with the blots, and are more 
variable than the neurotics. These findings are 
in keeping with clinical experience. The lack 
of significant differences among the three 
groups in average time per first response, 
T/1R, however, is not consistent with the 
generally held view that organic patients are 
slower in mean time per first response. 


Thus it can be seen from 


The most striking aspect of the findings re 
ported in Table 1 is the extreme variation re- 
flected in the large standard deviations for the 
Rorschach characteristics studied. This is the 
result of skewing toward the high end. That 
this is not a unique characteristic of the par- 
ticular samples used in this study but repre- 
sents the general state of affairs for many, if 
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not all, Rorschach variables can be seen from 
an examination of the means and sigmas re- 
ported by Beck [3] on the Rorschach vari- 
ables. In accordance with expectations from 
experience, therefore, marked vari- 
ability is the rule and not the exception when 


dealing with grouped Rorschach data." 


clinical 


Examination of the significant differences in 
reaction times, by card, among the three pa- 
tient groups represented in Table 2 reveals 
why the mean 7'//R, the average of the re- 
action times to the ten cards for any particular 
group, did not differ from group to group. 
Among the three patient groups there are, of 
the 30 intergroup comparisons (three com- 
parisons each for each of the ten cards), only 
four significant differences. These four differ- 
ences were not seen when the 7'/7R data were 
combined into total group statistics which, in 
Table 1, masked the significance of the dif- 
ferences. Two of the four significant differ- 
ences result when the neurotics and organics 
are compared with the psychotic group on card 
3, the psychotic patients reacting most rapidly 
(13.11 seconds). The two remaining differ- 
ences among the patients result from the sig- 


1The differences between and among the various 
groups were tested for significance by means of the 
t test. As seen in the present data and in a number 
of other studies of the Rorschach test [7], skewed 
distributions are frequent. Although the assumption 
of normal distribution is violated, therefore, the re- 
sults of the ¢ tests are at least suggestive of true 
differences. The whole problem of treatment of Ror 
schach data is under study by various investigators 


(4, 7, 8]. 


Table 2 





Reaction Time per Card and Intergroup Comparisons Among the Samples 











Neurotic Psychotic Organic Beck's Intergroup ¢ Value Probabilitiest 
Ct a ee PP tin Sp Normals® PN- PN- PN- PS- PS- OR- 
rte 7 , Mean SD PS OR BK OR BK BK 
I 23.02 +43.60 17.93 19.90 16.30 15.98 22.39 23.6 
Il 18.34 21.98 16.86 18.78 19.96 18.79 24.09 22.3 05 05 
Ill 20.06 28.68 13.11 11.59 27.15 31.07 25.58 244 05 05 01 
IV 21.92 21.28 21.11 21.81 25.48 24.96 31.93 27.7 01 01 
V 14.40 19.40 14.68 15.00 22.70 18.35 20.86 23.3 05 05 05 .05 
VI 22.99 25.00 24.03 26.06 36.96 41.25 30.80 26.3 05 
VII 24.94 33.93 22.82 28.06 37.73 40.80 30.80 27.4 05 
VIII 15.93 14.52 16.97 30.30 21.04 21.02 21.40 17.1 01 
IX 23.79 19.71 31.18 45.43 35.30 41.86 37.80 27.8 01 
xX 21.25 23.67 24.86 38.56 28.54 23.54 34.20 29.6 01 








*Beck, Rabin, Molish, and Thetford [3]. 
*Beck, Rabin, Thiesen, Molish, and Thetford [3). 








us 
the 
er 
ved 
ion 


rue 
Or - 
ors 





Reaction Time Characteristics of the Rorschach 


nificantly longer RT of the organics as com- 
pared to the neurotics and psychotics on card 
5. It is seen from the intergroup comparisons 
among the patient groups, therefore, that card 
3 is “easy” for the psychotics, while card 5 is 
“dificult” for the organics. 

Comparison of these three patient groups 
with Beck’s normals (30 more ¢ tests) yields 
twelve significant differences. “These all occur 
when normals and the neurotics and psychotics 
are compared.” These findings and the one 
significant difference between the neurotic and 
psychotic groups on card 3 indicate that, in 
terms of reaction time taken card by card (or 
combining all ten cards together as shown by 
mean 7'/7R in Table 1), the neurotic and psy- 
chotic patients behave in a similar manner. In 
other words, the ten Rorschach cards are of 
equal (equality defined operationally) stimulus 
value for reaction time for these two groups. 
This indicates that reaction time does not dis- 
criminate between neurotics and psychotics. 
The question may arise whether this lack of 
significant difference between the two groups 
is a group artifact and that it masks real 
differences which would be apparent were the 
reaction times of individual patients analyzed. 
This will be discussed in connection with the 
“mean fluctuation in time for the first response 
from card to card,” a variable discussed by 
Beck [3] in his study of a normal sample. 

On seven of the ten cards the normals react 
significantly more slowly than do the neurotics. 
There also is a similar trend on two of the 
remaining three cards. Likewise, the normals 
take a longer time reacting than do the psy- 
chotic patients, to each of the ten cards, with 
the differences statistically significant for five 
of the ten cards and trends in the same direc- 
tion for the other five cards. For both pa 
tient groups these significant differences dis- 
tribute themselves in approximately equal 
numbers over the achromatic and colored cards 
so that hue or lack of it does not appear to 
have influenced the shorter reaction times of 
these two groups in reference to the longer 


*Table 2 represents the results of 60 ¢ tests of 
significance. By chance alone less than one ¢ would 
be significant at the 1 per cent level and 2 to 3 / 
values would be significant at the 5 per cent level. 
The 6 significant differences obtained at the 1 per 
cent level and 10 at the 5 per cent level, therefore, 
represent greater than chance occurrence. 
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times taken by the normals. The distribution 
of four out of the possible five (all except card 
3) shorter reaction times of the neurotic group 
on the colored cards, as compared to the normal 
group, adds further evidence to the growing 
literature questioning the view that “blocking” 
(in terms of longer reaction times) on the 
colored cards is one of the “signs’’ of “neurotic 
color shock.”” On the contrary, blocking as 
here defined may be a “sign” of normal be 
havior (as reflected in the comparison of our 
data with those of Beck’s normal sample) and 
appears in response to the achromatic as well 
as the chromatic cards. 

Table 2 further shows the lack of significant 
differences between the normal and organi 
groups. Ihe differences in means for the ten 
cards although not significant distribute them 
selves about equally in favor of both groups 
i.e., the normals show longer reaction times to 
six of the ten cards while the organics show 
longer reaction times to the remaining four 
cards. These statistical data are not in accord 
with the usual clinical impression that organics 
are the slowest subjects in reacting with a 
first response, i.e., initial reaction time, to the 
Rorschach cards. From Table 2 it may be 
concluded that, in terms of reaction time, 
neurotics and psychotics react similarly and, 
on the other hand, normals and organics react 
alike. Analysis of reaction time to individual 
cards, therefore, suggests that there are differ- 
ences among groups in RT to these cards. 
These differences differ from group to group 
and card to card and indicate that group by 
card interaction is present, i.e., different groups 
appear to be differentially sensitive to certain 
Ci rds. 

The means of the patient groups in Table 
were analyzed further for the variables of 
age and sex. There was no a priori reason 
why these two variables should not influence 
reaction time. The three age groups compared 
were 14-25, 26-35, and 36-59. Intragroup 
card by card comparisons on these two vari 
ables yielded no significant differences. Age 
and sex, therefore, do not appear to be vari- 
ables which influence reaction time to any 
specific card. 

Table 3 presents an intragroup analysis 
(age and sex not differentially reported) of 
the mean reaction times given in Table 2. It 


9 
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Table 3 
Probabilities of Significance of Differences in Intragroup Reaction Times, by Card 
Neurotic Psychotic Organi Beck’s Group 
Card Card Card Card Card 
Number /)* Number p Number p Number p Number p 
5—4T 05 1-9 .05 1-6 { i-+ 01 5-4 01 
5-6 01 2-9 05 1-7 1-6 01 5—6 01 
5-7 01 34 O01 1-9 1-7 01 5-7 01 
5-9 O1 3-6 01 1-10 05 1-9 01 5—9 01 
8-4 05 3-7 01 1-10 01 10 01 
8-6 .05 3-9 01 2-4 01 6-9 05 
8-7 .05 3-10 05 2-6 05 7-9 05 
8-9 01 5-4 05 2-7 05 84 01 
5-6 01 2-9 01 8-4 01 
5-7 05 2-10 01 R—7 01 
5-9 01 3+} 05 8-9 01 
5-10 05 3-9 01 8-10 O1 
8-9 .05 -10 1 


*Only probability values at the 1 per cent or 5 per cent levels are reported 


+The first of the 2 cards represented has the shortest RT. 


represents the results of comparing each card 
against each of the remaining nine cards with- 
in any one group and shows the consistency of 
RT within the four diagnostic groups. Only 
those cards whose means differ significantly 
have been included in Table 3. In the re- 
sponses of the neurotic group there are two 
cards (5 and 8) which differ significantly from 
cards 4, 5, 7, and 9. The mean reaction times 
of 14.40 and 15.93 for cards 5 and 8, respec- 
tively, are significantly shorter than the means 
of the other four cards. Similar results are 
found in the normal group for cards 5 and 8. 
This finding is in keeping with both clinical 
experience and recent experimental verifica- 
tions [3] and may be attributed to the easily 
perceived “‘bat” of card 5 and the “animals” 
of card 8. The psychotic group shows similar 
results for card 5 but not for card 8. Their 
shortest reaction time is on card 3. The or- 
ganics do not respond with short reaction times 
to the same cards as these other groups but 
show, instead, their shortest time per first re- 
sponse on card 1. Thus, we find that each of 
the four groups shows differences in reacting 
to the ten cards. Furthermore, these intra- 
group differences, although showing some over- 
lap, differ from group to group. Thus, the 
neurotics react quickest to cards 5 and 8; the 
psychotics to cards 3 and 5; the organics to card 
1; and the normals to cards 1, 2, 3, 5, and 8. 


On the other hand, cards 4, 6, 7, and 9 for 
the neurotics; 4, 6, 7, 9, and 10 for the psy- 
chotics; 6, 7, 9, and 10 for the organics; and 
cards 4, 6, 7, 9, and 10 for the normals yield 
the longest reaction times. 

In general, then, there are both intra- and 
intergroup differences. These results indicate 
that the ten Rorschach cards are not similar 
in stimulus value with regard to the time 
elapsing between the presentation of the card 
and the first scorable response. Furthermore, 
the ten cards appear to divide into two groups 
with the shortest reaction times given to cards 
1, 2, 3, 5, and 8 and the longest reaction times 
to cards 4, 6, 7, 9, and 10. Even among the 
“short” and “long” groups of five cards there 














Table 4 
Rank Order of Means of Reaction Time, by Group 
Card PN* PS OR Beck 
I S 5 1 3 
I 3 3 2 4 
Ill 4 1 6 5 
IV 6 6 5 8 
V 1 2 4 1 
VI 7 8 9 6.5 
VII 10 7 10 6.5 
VIII 2 Q 3 2 
IX 9 10 8 10 
X 5 9 7 9 





*The card with the shortest mean reaction time has 
been given a rank of 1. 
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are differences among and within the four 
samples in terms of reaction time to any one 
card. 

In Table 4 are shown the results of rank- 
ing the mean reaction times per card from 
lowest to highest in each of the four samples. 
This ranking disregards the statistically signifi- 
cant differences in reaction time among the 
ten cards reported in ‘Table 3 and deals only 
with the numerical values of the means as 
presented in Table 2. This procedure is in- 
dependent of the ¢ test analyses reported above 
and provides relative ranks of each of the ten 
cards within the groups. These data show that 
normals and neurotics react fastest to card 5 
while the psychotics and the organics show 
their shortest reaction times to cards 3 and 1, 
respectively. A similar analysis can be made 
for the nine remaining rank positions for any 
one group or for all groups and these data are 
presented in Table 4. The correlations of the 
rank orders of all ten cards when each group 
is compared with the others are: .64 for BK- 





PN, .80 for BK-PS, .67 for BK-OR, .70 for 
PN-PS, .55 for PN-OR, and .60 for PS-OR. 

Finally, reaction time for individual cases 
is important to a clinical study. ‘These, after 
all, constitute the source of data with which 
the clinician works. Beck [3, p. 274] has re 
cently introduced the concept of “mean fluctu 
ation in time per first response,” a variable 
useful in interpreting individual records. This 
measure is determined by “taking the differ 
ences in first response time from one card to 
the next; adding these; and dividing by nine 
to get the mean.” For those individuals who 
reject one or more cards the sum of the differ- 
ences is divided by the number of cards used 
in the computation, minus one. 

Table 5 presents the results of this analysis 
for our three groups and for the sample 1e 
ported by Beck and his co-workers. As in that 
study, the present data also are reported in 
terms of a frequency distribution. ‘The number 
of individuals showing a mean fluctuation in 
time per first response is given for each 4 


Table 5 
Mean Fluctuation in Time Per First Responss 
Number Per cent 
Range (sec. ——— — SL 
PN PS OR Beck PN PS OR Beck 
110-11 1 64 
105-109 - 
100-104 
95-99 5 : 
90—94 sivcneshiesicliindndaaaitieniici 1 I 
85-89 1 1 1 64 
80-84 2 1.2 
75-79 64 
70-74 
65-69 1 .64 
60-64 1 1 4 1 1.35 2.55 
§5—59 1 1 1 1.35 
50-54 2 ? 4 2 69 2.55 
45-49 2 1 6 2 1.35 3.82 
es oo. eee 1 2 3 4 1 2.70 11.54 2.5 
35-39 WE er 1 2 g 2.70 5.10 
30-34 LEME «if 6 ] f 4.46 
25-29 eRe: bes ie 4 2 19 4 2.70 12.10 
20-24 EES, TOL AT 7 10 3 15 7 13.51 11.54 9.55 
15-19 atlas Ct ee ee 8 6 5 23 8 8.11 19.23 14.65 
RET? ‘Sinead ee Pony ee 19 20 6 25 19 27.03 23.08 15.92 
5—9 Pa ee — 34 18 5 26 34 24.32 19.23 16.56 
(if ee eae eee tera Tee 12 11 2 10 2 14.86 7.69 6.37 
Total eee 7. 100 74 26 157 100 99,98 100.00 100.01 
Mear : a ee 16.55 14.70 18.90 23.26 
SS ee a eee Oe 16.70 12.40 14.45 


18.75 
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second interval from zero to 114 seconds. ‘The 
right-hand side of the table gives the same data 
in terms of percentages, permitting compari- 
sons among the four groups despite differences 
in N among these groups. 

Examination of the group means at the 
bottom of Table 5 reveals values of 16.55, 
14.70, 18.90, and 23.26 seconds, for the 
neurotic, psychotic, organic, and normal 
groups, respectively. Once again, and this 
time as a result of examining individual cases, 
the normal group shows the largest mean in 
the measure of time per first response. Al- 
though the mean of the normal group is signifi- 
cantly different (1 per cent level) only in re- 
lation to the mean of the psychotic group, the 
trend is similar in relation to the other two 
groups. If Beck’s speculation [3, p, 274] re- 
garding the interpretation of this statistic is 
correct—‘‘an uneven tempo of adjustment 
to the varied conditions represented by the 
several figures’—then it appears trom the 
present data that the patient groups are more 
stable in their behavior in regard to reaction 
time than are the normals. 

Because of the skewed distributions, medians 
rather than means may be less biased in the 
evaluation of these data. The percentages at 
the right of Table 5 permit such measures. 
By cumulating percentages upward for each 
group we find the 50th percentile for the three 
patient groups in the range 10—14 seconds, 
while for the normals this same measure falls 
in the 15-19 seconds range. 


Summary 


The present investigation was designed to 
study the reaction time variable of the Ror- 
schach test with special emphasis on card-by- 
card comparisons within and between diag- 
nostic groups. The sample studied was a total 
sample during the two-year period, consisting 
of 201 patients given psychological examina- 
tions in the neuropsychiatric service of a mid- 
western hospital. The sample included 100 
psychoneurotic, 74 psychotic, and 27 organic 
patients. In addition to comparison of the 
groups for mean age, sex, education, IO, R 
(total number of responses), 7'/R, 7'/1R, and 
total time, the data on reaction time were 
analyzed by card (intra- and intergroup com- 
parisons), age, and sex. The mean reaction 
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times per card of these three patient samples 
also were compared with the published data 
of Beck and his co-workers who used a sample 
of 157 normals. 

Che analyses showed large intra- and inter- 
group variability in reaction times. ‘These data 
suggest that significant variability from patient 
to patient is the rule and not the exception. 
However, as expected from earlier studies, it 
was found that patients suffering from organic 
pathology are significantly less productive and 
slower in their average time per response than 
are psychotic or neurotic patients. The psy- 
chotics give more responses, take a longer total 
time with the blots, and are more variable than 
the neurotics. These findings have been sup- 
ported by clinical experience. The generally 
held view that organic patients are slower in 
average time per first response, T7//R, was not 
confirmed by this study. 

[Intergroup comparisons of reaction time per 
card led to the conclusion that, in terms of 
reaction time, neurotics and psychotics react 
similarly and, on the other hand, normals and 
organics react alike. There were no age or 
sex differences found within the three patient 
groups when these variables were analyzed 
card by card. Also, the concept of “neurotic 
color shock” frequently inferred from longer 
reaction times to the colored cards was not 
borne out by this study. 

Although there were differences among the 
four groups, intragroup comparisons of mean 
reaction times to each of the cards revealed 
that the ten Rorschach cards are not similar 
in stimulus value in regard to this variable. 
The ten blots appear to divide into two groups 
with the shortest reaction times given to cards 
1, 2, 3, 5, and 8 and the longest reaction times 
to cards 4, 6, 7, 9, and 10. 

Finally, an analysis of “mean fluctuation in 
time per first response” (Beck et al. [3]), 
taking one individual at a time within any one 
group, led to the finding that this measure, like 
the group measures reported above, yielded a 
higher mean for the normal group than for 
any of the patient groups. Although this find- 
ing was significant only for the psychotic 
group, a similar trend was present for the 
neurotic and organic groups. Beck [3] has 
written that normals show “an uneven tempo 
of adjustment to the varied conditions repre- 


Reaction Time Characteristics of the Rorschach 


sented by the several [Rorschach] figures.” 
It would seem, then, that these normals show 
more unevenness of tempo than do the neurot- 
ics, psychotics, and organics of the present 
study. 

In conclusion, the present investigation adds 
to the normative data available to the clinician 
and points to the need for further study of 
test variables. The data presented here afford 
comparisons with material from other samples, 
for example, Beck’s normals and his sample 
of schizophrenic patients [3] now under study. 
Cross-validation is essential in studies of this 
nature. 


Received May 14, 1951. 
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The Influence of Method of Administration and 


Sex Differences on Selected Aspects 
of TAT Stories’ 
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Although the TAT is widely used at 
present in clinical research and practice, there 
have been comparatively few systematic studies 
to provide us with information relative to 
norms and a more adequate standardization 
of the test [8]. A few studies have attempted 
to secure some normative data concerning the 
mood of TAT stories [1, 5, 6], frequency of 
themes [4, 9], and the identification of persons 
and objects perceived in the test cards [5, 9]. 
There is a dearth of normative material per- 
taining to sex differences, age differences, socio- 
economic status, and related factors. Different 
methods of administration and interpretation 
have been utilized with little attention paid 
to the possible influence such diverse methods 
may have on the significance of the results 
obtained. Even though the test has been used 
advantageously in the clinical situation to 
understand important aspects of the individ- 
ual’s personality, some normative frame of 
reference is essential for the most efficacious 
use of the test. This is particularly important 
since the test is being used in an extremely 
subjective fashion. Although normative re- 


1Presented in part at the Twenty-third Annual 
Meeting Midwestern Psychological Assn., Chicago, 
April 27, 1951. 
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search is laborious, particularly in the case of 
projective tests, such research is a necessary 
prerequisite to more reliable usage of psycho- 
logical techniques. 


The Present Research 


The present series of investigations grew out 
of the interests of the authors in securing 
normative data for the TAT cards. In plan- 
uch research, however, it was apparent 
that certain problems, heretofore not studied 
systematically, should be evaluated before a 
normative appraisal was begun. The two prob- 
lems considered were the method of adminis- 
tration and the influence of the sex of the ex- 
aminer and sex of subject on TAT stories. A 
plan of research was, therefore, designed 
which would throw some light on these prob- 
lems as well as providing us with material for 
tentative norms. 

In order to accomplish this objective two 
male and two female examiners were used to 
test comparable groups of male and female 
subjects. Furthermore, each examiner tested 
an approximately equal number of subjects of 
each sex under two different methods of ad- 
ministration. The latter included the complete 
administration of the entire series of 20 TAT 
cards in one session interrupted by a 10-minute 
rest period between cards 10 and 11, and the 
giving of the two halves of the test on different 
days separated by a two-day interval in line 
with the procedures recommended by Murray 
[7]. The third revised edition of the test was 
used throughout the study. 
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The same test instructions were used by all 
examiners. The instructions differed for the 
two test situations only to indicate when the 
test would be completed and to include an 
extra reference to the previous test period 
when testing was resumed on a different day. 
The methods of administration and recording 
were discussed in detail in order to insure 
comparability of results among the various ex- 
aminers. Antecedent events and endings for 
the stories were to be sought by the examiners 
if not produced spontaneously by the subjects. 
This procedure was to be followed twice where 
necessary, and if a subject still failed to pro- 
vide antecedents or endings for his stories, no 
further probing was to be attempted. 


Subjects 


Fifty-four male and 56 female undergradu- 
ate students at the University of Connecticut 
were selected from the introductory courses 
in psychology. Since each student had been in- 
formed that he would have to serve as a sub- 
ject in some experiment in the department, no 
selective factors were introduced beyond that 
necessitated by the requirements of an equal 
number of subjects from each sex. 


Although it was desired to secure an equal 
number of subjects in each experimental group, 
practical considerations interfered with this 
objective. The female examiners tested a few 
more subjects than the male examiners, but 
the groups were roughly comparable. The 
division of subjects is given in Table 1. 


Table 1 


Distribution of Subjects in Various 
Experimental Groups 





Complete Session Split Session 
Examiner Subjects Subjects Totals 
Male Female Male Female 








Male I 





6 6 5 6 23 
Male II 7 6 5 6 24 
Female I 8 8 8 8 32 
Female II 7 8 8 8 31 
Totals 28 28 26 28 110 





The average age of the male subjects ex- 
ceeded that of the females, a finding which 
undoubtedly was typical of university popula- 
tions during the academic year of 1948-1949. 


The mean ages of the two groups of subjects 
were 22.80 years (SD, 2.92) for the males 
and 18.94 years (SD, 1.29) for the females. 
Over three-fourths of the males were veterans 


of World War II. 


Treatment of the Data 


One of the difficult problems in the present 
undertaking concerns the treatment and analy 
sis of the TAT stories. The 2200 stories pro 
vide us with a moderate amount of material 
which must be ordered in some meaningful 
fashion. In the absence of objective test scores, 
analytic methods must be devised which have 
some reliability and which can reflect impor 
tant trends or differences. 


As a first step, each story was typed on an 
individual card with identifying material in 
coded form. Then, following the procedures 
developed in a previous study [6], each story 
was rated independently by the authors in 
terms of four criteria. These were: (/) Level 
of plot, (2) Mood of story, (3) Outcome, and 
(4) Activity of Central Character. A three- 
point rating scale was used in rating these four 
attributes of the stories. Since the criteria for 
evaluating Mood, Outcome, and Activity have 
been presented in detail in a previous publica- 
tion [6], only brief mention of them will be 
made here. Mood and Outcome of stories 
were rated as positive (generally happy, no 
conflict, etc.), negative (cunflict, danger, un- 
happiness), or neutral (no apparent affect or 
balance of positive and negative moods). In a 
similar fashion, the activity of the central char- 
acter was rated as active, passive, or neutral. 
Since the outcomes of a certain number of 
stories were uncertain and equivocal, one addi- 
tional category was used in rating this aspect of 
the stories. All such outcomes were given a 
“question mark’’ rating. In the present investi- 
gation the complexity or plot level of the 
stories also was evaluated in terms of the fol- 
lowing three levels: 


1. Description—No plot. Generalized description; 
no involvement of characters; enumeration of ob 
jects or persons. 

2. Simple Plot. Superficial personal relations; 
vague antecedents; cliché; description of others— 
no personal involvement; fanciful; confused; obvi- 
ously borrowed plot. 

3. Complex Plot. Personal projection; attitudes 
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toward self and others; complex motives; personal 
relationship; probability (realism). 


Several intensive sessions were held by the 
three raters in order to clarify the various cri- 
teria. In addition, two trial rating periods were 
used and further discussions held to insure 
maximum reliability of ratings. As a result the 
agreement between the three raters is high. 


(See Table 2.) 


Table 2 
Percentages of Agreement Between Three Raters 
on Four Aspects of TAT Stories 


A, 


Raters B A,C ee 
Level of story 92 .90 93 
Mood 95 95 95 
Outcome 92 93 93 
Activity- 

Passivity 91 91 .90 


The ratings for each story were then avail- 
able for purposes of comparing the factors be- 
ing appraised. Since there was complete agree- 
ment between all three raters on over 90 per 
cent of the ratings, no attempt was made to 
average the ratings. In those cases of less than 
perfect agreement, the modal rating was used 
as the subject’s score on a given picture. In 
this way, within the limitations of the factors 
rated, a more objective evaluation of changes 
in TAT stories under special conditions could 
be made. Obviously, many other attributes of 
the stories also could be investigated, and such 
research is now in progress. 


Results 


Method of administration. After all the 
2200 stories were rated for level, mood, out- 
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come, and activity, a comparison was made }e- 
tween the stories secured with one complete 
test period and those secured with the split 
sessions. For this purpose the ratings for the 
stories secured from cards 11 through 20, the 
stories directly affected by the change in meth- 
od of administration, were compared for the 
two different groups of subjects. The frequen- 
cy of the different ratings for each story was 
tallied and the findings, in percentage form, 
are given in Table 3. Inspection of the data 
reveals little difference between the two meth- 
ods of administration with regard to the fac- 
tors evaluated. Analysis of the data by the chi 
square test also fails to show significant differ- 
ences between the two methods of administra- 
tion on any of the four attributes (P > .30). 
Within the framework of the present study, 
therefore, method 
appreciably affect 
from our subjects. 
C 


of administration does not 
the test responses secured 
x differences. As already indicated, in 
planning the present research we were inter- 
ested in evaluating the possible influence of 
sex of examiner, sex of subject, and their inter- 
action on TAT stories. It was hypothesized 
that such factors might influence responses in 
a general manner, or more specifically in rela- 
tion to selected test cards. Once our data were 
secured, however, some problems became evi- 
dent with regard to the analysis of the data. 

In terms of the kinds of ratings secured, 
e chi square test was considered first as the 
simplest method of comparing the frequencies 
of the various ratings for the different subject- 
examiner groups. In this way every score 
could be tallied and comparisons made. How- 
ever, to do this for all of the ratings on all 
cards combined would give an inflated num- 
ber of cases and a spuriously high level of sig- 


th 


Table 3 


Percentages of Ratings for Four Attributes of TAT Stories 




















Under Different Methods of Administration 
Level Mood Outcome Activity 
No Simple Complex Neg- Neu- Pos- Neg Neu- Pos- Pas- Neu- Ac- 
Plot Plot Plot ative tral itive ative tral itive sive tral tive 
Split 
Sessions .09 55 36 64 25 ll 31 32 .37 12 61 27 
Complete 
Sessions .09 53 38 65 25 10 35 .32 33 -13 59 .28 








= 


ee 
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nificance since each subject would contribute 
20 scores for each comparison. Another ap- 
proach was to use the analysis of variance 
method. The first question raised here was 
whether our data could be considered con- 
tinuous on the basis of the three-point rating 
system for each attribute. To avoid this prob- 
lem the frequency of each type of rating was 
tallied for each subject for purposes of com- 
paring the different groups of subjects. Be- 
cause of the peculiar nature of the distribu- 
tions of ratings, this procedure did not lend 
itself well to either the chi square test or 
analysis of variance [2]. After much delibera- 
tion and consultation, the following method 
was adopted as being the most feasible.* 

Each of the three ratings for the attributes 
studied was given a numerical value from one 
to three—e.g., the lowest plot level was scored 
as one and the highest as three. Similarly, a 
negative mood or outcome and a passive role 
were given a value of one, all neutral ratings 
were scored as two, and positive moods, out- 
comes, and active role: were given a value of 
three. By then adding the values assigned to 
each story a total score was obtained for every 
subject on each of the four attributes. The 
distribution of the obtained scores was continu- 
ous, and an analysis of variance was then at- 
tempted." Thus by use of this method, we 
could evaluate the influence of the sex of ex- 
aminer, sex of subject, and the possible inter- 
action between these two variables [3, p. 213]. 
Since there were unequal numbers of subjects 
in the four groups, all groups were reduced 
by random selection to 23 in number, the size 
of the smallest existing group. Furthermore, 
since only 12 of the 20 TAT cards are identi- 
cal for male and female subjects, separate sta- 


“We have referred to the problems encountered 
in analyzing our data since such problems are not 
infrequent in working with thematic material. A 
great deal of time and energy was expended in the 
process since some test of significance was desired 
and standard techniques were not readily applicable 
to such data. It is relatively easy to misuse the sta- 
tistical tests of significance or to use cut-off points 
after inspecting your data which only lead to erron- 
eous conclusions. 


5One modification had to be introduced with re- 
gard to the outcome of the stories because of the in- 
decisive or questionable outcomes. All such out- 
comes were finally classified as neutral for purposes 
of statistical comparison. In all they constituted less 
than 5 per cent of the outcomes. 


tistical analyses were made of the responses to 
these cards and of those to the remaining cards. 

When the scores for the 12 similar cards 
are compared for the different groups of sub- 
jects, no significant differences are found be- 
tween the groups when the 5 per cent level of 
confidence is adopted. This was true for all 
four attributes studied. Thus, in terms of these 
attributes of the stories, neither the sex of the 
examiner, the sex of the subject, nor the inter- 
action between these two variables appeared to 
be a significant influencing factor. However, 
when similar comparisons are made for the re 
maining eight cards, some significant findings 
are secured. 

The analysis of the ratings on the eight 
cards which differ for the two sexes shows 
highly significant differences between the male 
and female subjects with regard to the level 
and mood of the stories (p¢ < .001). No sig- 
nificant differences were found with respect to 
the outcome or activity of the stories. Further 
more, neither the sex of the examiner nor the 
resultant interaction between sex of examiner 
and sex of subject appeared to influence the 
stories in terms of any of the four attributes 
studied. The fact that differences were found 
with some factors and not with others cannot 
be adequately explained at the present time. 
Research now in progress may clarify this 
problem. 

On the basis of the above results one would 
conclude that the pictures designated for use 
only with male subjects differ significantly 
from those for use with female subjects. In 
general, the female cards tend to bring forth 
more negatively toned stories than do those 
for male subjects. The female subjects in re- 
sponse to these cards also produce more com- 
plex plots than do the males toward the corre- 
sponding cards. In the light of our results 
with the 12 cards which are given to both 
sexes, the difference between the sexes on the 
remaining eight cards is interpreted as indicat- 
ing the nonequivalence of the latter sets. 


Summary and Conclusions 


As a preliminary to normative research on 
the TAT an attempt was made to evaluate the 
influence of method of administration, sex of 
examiner, and sex of subject on TAT stories. 
The TAT was administered to 54 male and 
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56 female subjects. Half of the subjects took 
the test in one complete setting whereas the 
remaining subjects received one-half of the 
test in each of two sessions separated by a two- 
day interval. Approximately similar numbers 
of male and female subjects were tested by 
male and female examiners under the two 
different methods of administration. The re- 
sulting 2200 TAT stories were then rated on 
a three-point scale by three judges in terms of 
level of plot, mood and outcome of story, and 
activity of central character. 

A comparison of the two groups of subjects 
separated in terms of method of administra- 
tion revealed no significant differences between 
the groups on any of the attributes rated. 
Furthermore, with regard to the 12 cards giv- 
en to subjects of both sexes, no significant dif- 
ferences between the male and female subjects 
were found for any of the test attributes stud- 
ied. However, the analysis of the remaining 
eight cards showed significant differences be- 
tween male and female subjects in terms of the 
level and mood of the stories. This finding is 
interpreted as indicating that the eight cards 
designated separately for administration to 
male and female subjects are not equivalent 
in terms of their stimulus properties. Neither 
the sex of the examiner nor the interaction be- 
tween the subject and examiner in terms of 
sex was significant as far as the present test 
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attributes are concerned. 


Received May 25, 1951. 


nN 


6. 


“I 


9. 


References 


Coleman, W. The Thematic Apperception Test: 
I. Effect of recent experience. II]. Some quantita- 
tive observations. J. clin. Psychol., 1947, 3, 257- 
264. 

Cronbach, L. J. Statistical methods applied to 
Rorschach scores. Psychol. Bull., 1949, 46, 393- 
429. 

Edwards, A. 
Rinehart, 1946. 


Eron, L. 


Statistical analysis. New York: 


D. Frequencies of themes and identi- 
fications in the stories of schizophrenic patients 
and non-hospitalized college students. J. con- 
sult. Psychol., 1948, 12, 387-395. 

Eron, L. D. A normative study of the Thematic 
Apperception Test. Psychol. Monogr., 1950, 64 
(9), Whole No. 315. 

Garfield, S. L., & Eron, L. D. Interpreting mood 
and activity in Thematic Apperception Test 
stories. J. abnorm. soc. Psychol., 1948, 43, 338- 
345, 

Murray, H. A. Manual for the Thematic A pper- 
ception Test. Cambridge: Harvard University 
Press, 1943. 
Rosenzweig, S. Apperceptive norms for the 
Thematic Apperception Test: I. The problem 
of. norms in projective method. J. Pers., 1949, 
17, 475-482. 
Rosenzweig, S., & Fleming, E. Apperceptive 
norms for the Thematic Apperception Test: II 
An empirical investigation. J. Pers., 1949, 17, 
483-503. 


The Examiner As A Variable in the Draw- 
A-Person Test 


Wayne H. Holtzman 


The University of Texas 


In recent years increased interest has been 
shown in the drawing of the human figure as 
a projective technique for investigating per- 
sonality. Although most of the published re- 
ports concerning this test have dealt with its 
use by a highly trained clinician employing an 
intuitive approach, several recent workers have 
been concerned with its standardization and use 
in a more objective manner [1, 3, 4]. Impor- 
tant considerations for such standardization 
are such factors as instructions given the sub- 
ject, general procedures for test administra- 
tion, the personal characteristics of the exam- 
iner, and the examiner-subject relationship. 
The present study is concerned with the effect 
of variations in the personality and sex of the 
examiner upon performance of male and fe- 
male subjects in the Draw-A-Person Test. 

This investigation was prompted by the 
results of an experiment reported by Sinnett 
and Eglash [4]. Working with 68 college 
sophomore women Sinnett and Eglash conclud- 
ed that the size of the figure drawn by a sub- 
ject may be of importance in an analysis of 
the examiner-subject relationship. If these con- 
clusions are generally true, they must be taken 
into account in all future research with the 
Draw-A-Person Test. 


Description of the Study 


Four experienced examiners, two male and 
two female, were chosen from a group of ad- 
vanced graduate students in clinical psychol- 
ogy. The two pairs of examiners were selected 
so as to maximize differences in examiner ap- 
pearance and personality within both sexes. 
Examiner M1 was nearly a foot taller and 
sixty pounds heavier than the other male ex- 
aminer, M2. The two female examiners, F1 
and F2, were approximately the same size but 


differed considerably in degree of feminine 
qualities. All four examiners were given short 
preliminary training in the administration of 
the test using written 
from Machover [2]. 


instructions adapted 


In order to control all factors other than 
the examiner’s sex and personality, the experi- 
ment was conducted at regularly scheduled 
hours in four testing rooms, one for each ex- 
aminer, and the subjects were carefully ran 
domized among the examiners at the beginning 
of each testing period. All examiners tested 
subjects simultaneously, rotating rooms each 
testing period to eliminate room differences as 
a factor. 

The subjects consisted of 40 male and 40 fe- 
male college students taken from classes in edu- 
cation and psychology at the University of 
Texas during the first summer session of 1950. 
Ages of the males ranged from 18 to 48 and of 
the females from 18 to 58 with median ages of 
26 and 25, respectively. Sixty-three per cent of 
the men and 43 per cent of the women were 
married. Assignment of subjects to examiners 
was accomplished by a randomized block de 
sign so that each examiner tested ten male and 
ten female subjects. This made possible the 
examination of two major hypotheses concern- 
ing the effect of the examiner upon drawing 
characteristics: (1) The sex of the examiner 
has a measurable effect upon the drawings pro- 
duced in the Draw-A-Person Test by (a) male 
subjects, and by () female subjects. (2) The 
personal characteristics of the examiner aside 
from sex have a measurable effect upon the 
drawings produced by (a) male subjects, and 
by (4) female subjects. 


Methods of Analysis 


Three levels of drawing analysis were at- 
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tempted: (1) objective characteristics of the 
drawing; (2) judged “masculinity” of the fig- 
ure; and (3) intuitive guessing of examiner's 
identity by trained judges. 

The height of the figure was measured in 
centimeters similar to the analysis made by 
Sinnett and Eglash. Other objective variables 
analyzed were the height of the head; the ratio 
of figure size to head size; the sex of the first 
of the two figures drawn; the tilt of the figure 
from a vertical frame; and the location of the 
figure on the sheet of paper. 

The masculinity of each figure was obtained 
by an extensive judging process involving 12 
trained judges, students in an advanced course 
in projective techniques. Each drawing was 
rated independently by each of the 12 judges 
on a five-point scale ranging from “Very Mas- 
culine” through “Neutral” to “Very Femi- 
nine.” Male and female figures were judged 
in separate groups to minimize shifts in the 
frame of reference due to the sex of the figure. 
Prior to analysis of the experimental pairs of 
drawings the judges were given a short train- 
ing period with a number of drawings collected 
previously. The final masculinity score for a 
particular drawing was obtained by pooling the 
ratings of the 12 judges and computing a mean 
rating. Eighty drawings were judged a second 
time two weeks after the first ratings. The 
correlation between the two sets of mean 
masculinity ratings was .87, indicating a high 
degree of reliability. 

Several “blind” matching experiments using 
the same 12 judges were attempted in order 
to examine the ability of trained judges to 
guess correctly certain examiner characteristics 
from the drawing alone. The first experiment 
was performed in three stages, the first two 
using 16 pairs of drawings each and the third 
consisting of 40 pairs. The pairs of drawings 
in each set were selected at random from a 
larger sample of drawings. Each judge, work- 
ing independently of the others, was asked to 
guess the sex of the subject who drew the two 
figures in a given pair of drawings. Since the 
larger sample from which the 16 pairs of draw- 
ings had been selected contained an equal num- 
ber of male and female subjects, each judge 
was informed that the probability of any given 
pair of drawings having come from a male 
subject was .50. In addition to individual 
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evaluation of each judge’s guessing accuracy, 
an analysis of the majority judgment for each 
drawing was made by pooling the individual 
guesses and selecting the majority guess as the 
pooled estimate of each subject’s sex. ‘The same 
general procedures were used for the second 
set of 16 pairs of drawings and for the third 
group of 40 pairs. The three repetitions of 
the first matching experiment were spaced 
about two weeks apart. 

In the second experiment, the 12 judges, 
working independently, were given forty pairs 
of drawings in a random serial order, all draw- 
ings by male subjects, and were asked to guess 
the sex of the examiner who collected each 
pair of figures. In a similar manner the judges 
were given the forty pairs of drawings made 
by female subjects and were asked to guess the 
sex of the examiner. In both instances the only 
information given the judges was the fact that 
for a given pair of drawings the probability of 
coming from a male examiner was .50. 

The third series of matching experiments 
was designed to evaluate each judge’s ability 
to guess correctly the specific identity of the 
examiner. The judges were given four sets of 
twenty pairs of drawings, one set consisting of 
all the male subjects tested by the two male 
examiners, a second set consisting of all the 
female subjects tested by the male examiners, 
and similarly, a third and fourth set of draw- 
ings collected by the two female examiners. 
All 12 judges were well acquainted with the 
four psychologists doing the testing. 


Results 


Objective characteristics of the drawings. 
Six objective measures were investigated. 
Three of these measures, the height of the fig- 
ure, the height of the head, and the ratio of 
figure to head size, were treated statistically 
in a series of two-by-two factorial designs. 
None of these measures showed variation 
which could be attributed to the sex of the 
subject, the sex of the examiner, or to the 
personality of the examiner. Similarly, no dif- 
ferences between any of the groups for either 
the tilt or the location of the figure were 
found. 

One hypothesis regarding examiner influ- 
ence can be stated as follows: Subjects tend to 
draw first a figure whose sex is similar to the 
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examiner's sex. Examination of the data pre- 
sented in Tables 1 and 2 leads to the rejection 
of this hypothesis and the acceptance of an al- 
ternative ; namely, subjects tend to draw first 
a figure whose sex is similar to their own. 
The chi square for Table 2 is 22.7, which is 
significant at the .001 level of confidence. 


Table 1 


Relation of Sex of First Figure Drawn 
to Examiner’s Sex 


Sex of Sex of Figure 


Examiner M I Yotal 
M 24 16 40 
I 23 17 40 
Potal 47 33 80 
‘Table 2 
Relation of Sex of First Figure Drawn 


to Subject’s Sex 


Sex of Sex of Figure 

Subject M k Total 
M 34 6 40 
i 13 27 4( 


lotal +7 33 80 


Judged Masculinity of the figures. Analysis 
of ‘Table 3 reveals no variation in the mascu- 
linity of the figures which can be attributed to 
the sex of the examiner. The figures drawn 
by men, however, were definitely more mascu- 
line in appearance than the drawings made by 
women. 


Table 3 


Examiner’s Sex and Subject’s Sex as Factors in the 
Masculinity of Male and Female Figures 





Source of F Values 








Variation d.f. Male Fig. Female Fig. 
Examiner’s sex 1 .26 .02 
Subject’s sex 1 7.95%* 11.80** 
Interaction 1 .28 4.22* 
Error 76 





**.01 level of significance. 
*.05 level of significance. 


When the data are fractionated further in 
an attempt to analyze the within-sex person- 


ality factors of the examiners the results are 
similar to those obtained when examiner-sex 
was investigated. ‘These analyses are presented 


in Tables 4 and 5. 


‘Table 4 
Male Examiner's Personality and Subject’s Sex as 
factors in the Masculinity of Male and 
Female Figur: 


Source ot F Values 


Variation d.i Male Fig Female Fig 


Examiners M1 


and M2 1 33 10 
yer ba i 2.92 98 
Interaction l 1.31 13 
Error 36 
Table 5 


Female Examiner's Personality and Subject’s Sex as 
Factors in the Masculinity of Male 


and Female Figures 

Source of F Values 
Variation d.f Male Fig Female Fig 
Examiners F1 

and F2 l 21 34 
Subject’s sex 1 4.71 14.37°* 
Interaction 1 1 2.78 
Error 36 


level of signincance 


*.05 level of significance. 


Matching experiments. Nine of the twelve 
judges guessed the sex of the subject correctly 
better than 70 per cent of the time. A score 
of 70 per cent or better is significant at the 
001 level. Table 6 summarizes the results 
when the majority judgments were analyzed. 
The chi square for this contingency table is 
27.225, significant at the .001 level. 


Table 6 
Relationship Between Sex of Subject and Pooled 


Judgments of Sex Based on Drawings Alone 


Actua! Sex of Subject 





Guessed Sex 


of Subject M F Total 
M 31 9 40 
F 5 27 32 

Total 36 36 79 
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In the second experiment no single judge 
was able to guess the sex of the examiner bet- 
ter than chance. Similar results were obtained 
for the majority judgments. 

Results for the third experiment were like- 
wise negative. None of the judges considered 
individually or as a group could guess the 
examiner's identity better than chance. 

Discussion 

The results obtained by Sinnett and Eglash 
concerning the relationship between examiner 
personality and the height of the figure drawn 
by the subject were not supported. Only a few 
of the many possible drawing characteristics 
were examined in the present study. Several 
others were attempted, but it soon became ap- 
parent that the majority of signs occurred too 
infrequently to permit adequate analysis with 
only 80 cases. 

Generalization of these findings to the every- 
day clinical setting where one is dealing with 
disturbed individuals can be made only with 
great caution. The examiner and his relation- 
ship to the subject is undoubtedly of utmost 
importance in any testing situation. The pres- 
ent results indicate, however, that interexami- 
ner variation can be minimized to the point 
where it is of little importance in the analysis 
of the individual subject’s performance on the 
Draw-A-Person Test. 


Summary 


Forty male and forty female college stu- 
dents were administered the Draw-A-Person 
Test. Two male and two female examiners 
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each tested ten men and ten women selected 
at random from the group of subjects. 


An intensive analysis of the drawing charac- 
teristics, using twelve trained judges in a 
series of experiments as well as examining 
certain objective measures, revealed no varia- 
tions in the drawings which could be attribut- 
ed to the examiner’s personality, sex, or physi- 
cal appearance. Highly significant differences 
in drawing characteristics were found, how- 
ever, which could be attributed to the sex of 
the subject taking the test. Women tend to 
draw more “feminine” human figures than do 
men. 

Contrary to the findings of Sinnett and 
Eglash, it is possible to minimize interexaminer 
variations in the Draw-A-Person Test, at 
least with normal adult subjects, to the point 
where such factors need not be seriously con- 
sidered in the analysis of individual drawings. 
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Internal Consistency of the Scoring Categories 
of the Rosenzweig Picture-Frustration Study” 


Mahlon V. Taylor, Jr. 


American Institute for Research, Pittsburgh, Pa 


Projective techniques seem typically to be- 
come popular without regard to their reliabil- 
ity. It is dificult to understand why this should 
be, since projective instruments commonly deal 
with variables decidedly more tenuous than 
those assessed by objective tests, and the latter 
do not gain wide acceptance without evidence 
of satisfactory reliability. The explanation may 
lie in orientational divergence between propo- 
nents of rational-quantitative measurement and 
those who assert the superior validity of in- 
tuitive-qualitative methods of evaluation. 

Without prejudice to the intuitive-qualita- 
tive approach, the viewpoint taken here is that 
when an evaluative technique, projective or 
otherwise, presents results in numerical form 
it, becomes subject to criticism on statistical 
grounds. The relevant logic becomes mathe- 
matical as well as verbal. It is then proper for 
both clinician and psychometrician to make two 
inquiries: (a) what evidence do the scores on 
the measure provide as to the psychological 
reality of the hypothesized underlying vari- 
able? (4) granting the reality of the measured 
variable, how much confidence is to be placed 
in an obtained score? 

The answers to both inquiries hinge on re- 
liability, and perhaps especially reliability in 
the sense of internal consistency. With respect 
to the first inquiry, if a measure has high in- 
ternal consistency the items are largely meas- 
uring the same variable, the measure is com- 
posed of relatively homogeneous items, and the 
hypothesis of a single underlying variable is 
supported. If the internal consistency is low, 
the measure is complex (i.e., the items lack 


‘This study was carried out at the Allegheny Vo- 
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homogeneity), and the hypothesis of a single 
underlying variable is not supported. 

The answer to the second inquiry is fur 
nished most directly by the standard error of 
measurement derived from the reliability of 
the measure. 

The Rosenzweig Picture-Frustration Study 
(P-F) is a semiprojective technique for “As- 
sessing Reactions to Frustration” [7, p. 5], 
and yields numerical scores on six scoring cate 
gories (as well as on other indices). Neither 
the manual [8] nor other literature provides 
information on internal consistency of the scor- 
ing categories. The present study was designed 
to obtain such evidence. 


Description of the Measure 


The P-F has been described elsewhere [7, 
8], but a brief recapitulation of aspects par- 
ticularly relevant to the present study may be 
helpful. The subject is presented with 24 car 
toon-like depictions of frustrating situations. 
He is expected to identify with the frustrated 
figure and write a short response for each stim- 
ulus or item. Each response is classified by 
direction of aggression—extrapunitive (F), in- 
tropunitive (J), or impunitive (11)—and by 
type of reaction—obstacle dominant (O-D), 
ego-defensive (E-D),or need-persistive (N-P). 
Scoring samples and norms are provided. These 
six scoring categories represent variables hy 
pothesized by Rosenzweig. Criteria for classi- 
fying responses were not deduced from P-F 
results, but were based on antecedent premises. 


Method 


For present purposes the focus of interest is 
not on retest reliability (i.e., function fluctu- 
ation), but rather on reliability for a single 
administration. The latter kind of reliability 
can be estimated by split-half correlation or by 
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any of the “rational equivalence” formulas. 
Kuder and Richardson [4, p. 152], and others, 
have decried the split-half method since the 
value obtained is not unique, but will vary ac- 
cording to the way in which the test is split 
into halves. 

Jackson and Ferguson [2, p. 72] stated that 
K-R (20) has been found “to yield very sat- 
isfactory results, [and] may be derived on the 
basis of the equivalence assumption only.” 
Their identical formula (29) is derived on the 
single assumption that average interitem co- 
variance in the test is equal to that in the hy- 
pothetical parallel test, and to that between 
the tests. The assumption amounts to a rea- 
sonable definition of parallel tests. It is not 
assumed that variances are equal among items 
(i.e., the same proportion of scored responses 
is made to each item) as <-R (21). 
Neither is the matrix of interitem correlations 
assumed to be of rank one as in K-R Cases II, 
III, and IV. 

The Jackson-Ferguson derivation justifies 
the application of their formula (29) or K-R 
(20) to the P-F scoring categories, while the 
K-R derivations would not. For any given 
scoring category the “difficulty” or popularity 
varies greatly among items, thus directly vio- 
lating one K-R assumption, and making ful- 
fillment of the other impossible, as when “items 
are heterogeneous with respect to difficulty the 
intercorrelations will not be equal, since the 
correlation between two test items is not in- 
dependent of their difficulties” [2, p. 76]. 

The analysis of variance method of Hoyt 
[1] yields estimates of reliability identical with 
those by J-F (29), requires the same data, 
provides an independent check on computa- 
tions, and determines the allocation of vari- 


does 


ance among persons, items, and remainder. 
The pertinent formulas for estimating reli- 
ability are: 


1. Jackson-Ferguson (29), 
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nm 1 S;? 
where: 
r+¢ — reliability of a test, 
n = number of items in the test, 
S; = standard deviation of item i, 
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S, = standard deviation of the test, 
and 2S;,? is over all items. 


2. Hoyt [1, p. 155], 


(variance among individuals) 


Tee - 


(remainder variance) 


(variance among individuals) 


The formula for the conventional standard 
error of measurement, adapted to J-F notation 


1S, 


Sto =S; Vl — ree. 
Hoyt [1, p. 156] obtains a closely compar- 
able estimate as, 
(remainder sum of squares) 


(degrees of freedom for individuals) 


[f Hoyt’s value is multiplied by 


ae . i. 1 
V\ n-1 N 
it is identical with the conventional S;, . 


The Study 


The data were taken from the records of 
two samples (Ns = 130) of adult, white 
males tested for purposes of personnel evalua- 
tion or vocational guidance. Scoring had pre- 
viously been done by Rosenzweig’s criteria [8, 
pp. 167-201], with doubtful scorings independ- 
checked. 

For each individual, each item response score 
and the category totals from the P-F Record 
Blank [8, p. 203] were punched on a Hol- 
lerith card. By means of the IBM counter- 
sorter, sums of scores for individuals were then 
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checked against sums of scores for items to 
insure accuracy. 

‘The two samples had been drawn for other 
purposes on the basis of occupational criteria. 
Since the samples might represent different 
populations for the purposes of the present 
study, the data for each sample were first 
treated separately, homogeneity among samples 
of category means and variances verified, and 
the two samples combined. Table 1 gives, in 
raw score units, the separate and combined 
sample means, standard deviations, variances, 
sums of item variances, reliability estimates by 
J-F (29) or K-R (20), and standard errors 
of measurement for the six P-F scoring cate- 
gories. Statistics in percentage scores [8, p. 








Internal Consistency of the Rosenzweig P-F Study 


Table 1 


Separate and Combined Sample Statistics (in Raw 
Scores) for the P-F Scoring Categories 
(N, — N,—= 130) 


Scoring Category 











Statistic Sample E I M 0-D E-D N-P 

1 8.36 7.38 17.94 4.08 11.37 8.24 

Mean 2 8.88 7.20 7.65 4.30 11.50 7.93 

1-+-2 8.62 7.29 7.79 4.19 11.43 8.09 

l 2.63 1.64 2.12 1.56 2.11 2.23 

Standard 2 2.66 1.74 2.01 1.50 2.40 2.53 

Deviatior 1+-2 2.65 1.70 2.07 1.53 2.26 2.39 

1 6.91 2.71 4.50 2.44 4.46 97 

Variance 2 7.03 3.03 1.06 2.24 5.78 6.41 

1+ 2 7.04 2.88 4.30 2.35 6.12 71 

Sum of 1 3.02 2.25 2.55 1.99 3.40 2.61 

Item 2 3.23 2.25 2.82 2.23 3.53 2.80 

Variances 1 4 3.14 2.26 2.69 2.12 3.49 2.71 

Reliability 1 59 17 45 19 25 49 

Coefficient -56 7 82 01 41 .59 
J-F(29)or 

K-R(20) 1+-2 58 4 39 10 33 55 

Standard 1 1.69 1.49 1.57 1.41 1.83 1.58 

Error of 2 1.75 1.49 1.66 1.49 1.85 1.62 
Measure- 

ment 1+2 1.72 1.50 1.62 1.45 1.85 1.61 





Table 2 


Combined Sample Statistics for the P-F Scoring 


Cated 














ategories in Percentage* Scores 
(N = 260) 
Statistic E I VM O-D E-D N-P 
M 35.9 30.4 32.5 17.5 47.6 33.7 
: 11.0 7.1 8.6 6.4 94 10.0 
... 72 5.2 6.8 6.0 7.7 6.7 





*Percentage scores given in the P-F manual [8, p. 202] 
are raw scores divided by the number of items and mul- 
tiplied by 100. 


202] for comparison with the manual norms 
[8, p. 207] appear in Table 2. 

The results by analysis of variance cannot 
be presented as compactly, but the variances 
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additional information and are presented for 
the combined sample in Table 3. The r¢,’s are 
not shown as they are identical with thse of 
Table 1. 

In the analysis of variance missing responses 
were treated as zero scores without loss in de 
grees of freedom [9]. When degrees of free- 
dom for total and remainder were reduced by 
the number of missing responses, the resulting 
ree s were slightly reduced, but only one sam 
ple coefficient was as much as .02 smaller. 

It should also be noted that in computing 
item variances, Hoyt’s formula, devised for re 
sponse scores of one or zero, had to be modi- 


fied to take account of response scores of one 


half. 


Interpretation 


The obtained reliability estimates (Table 
1) range from .01 for O-D with the second 
sample to .59 for F with the first sample and 
N-P with the second sample. For the com 
bined sample the values run from .10 for O-D 
to .58 for F 36. By usual 
standards, reliabilities of this 
order are unsuitable for evaluation of individ- 
uals. Some argument might be made for F 
= .58) and N-P (ri: 55), but the 


remainder of the estimates are below .40. 


, with a mean rz; of 
measures with 


There are more direct answers, however, to 
the two inquiries suggested earlier. Evidence 
as to the psychological reality of the variables 
underlying five of the six scoring categories is 
furnished by the ratios of persons variance to 
136]. Except for 
O-D the F-ratios are significant at the .05 
level, with F passing the .01 level. While this 
does not guarantee the nature of the underly- 
ing variables, it is of interest that of the six 
scoring categories only the obstacle-dominant 
does not seem to have been mentioned by Ro- 
senzweig prior to the P-F, nor to be anywhere 


remainder variance [3, p. 














for persons, items, and remainder furnish integrated with the relevant theoretical for 
Table 3 
P-F Scoring Category Variances for Combined Sample (N 260) 
Source df. E I M O-D E-D N-P 
; RB alia 6239 .2132 .1986 .2074 1297 .2257 .2006 
Items. donnie dal 23 22.3899 28.2769 25.8292 11.2204 21.8151 23.7643 
Persons 259 .2944 .1204 1799 .0983 .2143 2388 
Remainder 5957 1241 .0936 1097 .O882 1428 
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mulations [5, 6]. 

Granting the reality of five of the scoring 
categories, the degree of confidence which can 
be placed in obtained scores is indicated by the 
standard error of measurement (S;..) (Tables 
1 and 2). For a given true score, the corre- 
sponding obtained score is expected to fall 
within +1.0 S;.. of the true score about two- 
thirds of the time. Thus, assuming normal dis- 
tribution, the middle 50 per cent of the sam- 
ple scores on F would fall within +0.6745 S; 
of the mean, the lowest quarter would fall 
below 29 (percentage score) and the highest 
quarter would score above 43. It happens that 
S:.. here is nearly identical with 0.6745 S;, so 
that true scores at the mean would be expected 
to correspond to obtained scores in the lowest 
or highest quarter of the distribution in about 
one-third of cases. Since E yielded the highest 
estimate of reliability, this illustration indi- 
cates the highest degree of confidence which 
should be placed on any of the obtained cate- 
gory scores with the present sample. 

At the opposite end of the reliability esti- 
mates, excluding O-D, is 7. On the same basis 
as before, true scores at the mean would cor- 
respond to obtained scores in the lowest or 
highest quarter in about 44 per cent of cases. 
The lack of precision, of course, is due to the 
low reliabilities which indicate that 42 per 
cent to 78 per cent of the score variance in the 
five categories is not reliable variance. Possi- 
bly with more deviant populations the P-F is 
more reliable. The obtained results, however, 
do not show much variation between the two 
present samples, and there seems to be no rea- 
son to suppose that repeated samplings from 
similar populations, tested under similar con- 
ditions, would yield greatly variant results. 

There are certain peculiarities of the P-F 
which, regardless of population, may seriously 
limit its reliability. Large “Items” variances 
militate against high reliabilities here as with 
the GCR [9]. It may not have been the in- 
tention to select stimuli for the P-F which 
would strongly favor responses in one scoring 
category on one item, in a second category on 
a second item, and so on, but that is the effect. 
Obtained category popularities run as high as 
.92 (E on Item 7) and as low as .00 (M on 
Item 19). It may also be questioned whether 
only 24 items can be expected to yield satis- 
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factory reliabilities for as many as six vari- 
ables. Calculations carried out on E scores of 
the first sample indicate that the P-F would 
need to be more than quadrupled in length 
(to 100 items) to raise the reliability of this 
category from .59 to .87. 


Summary 


The internal consistency of a measure is 
indicative of the reality of the underlying vari- 
able and of the degree of confidence properly 
placed in obtained scores. In the absence of re- 
ported internal consistency estimates for the 
six scoring categories of the Rosenzweig Pic- 
ture-Frustration Study, the present investiga- 
tion was undertaken. 

The records for two samples (Ns = 130) 
of adult, white, community males who had 
taken the P-F during testing for vocational 
guidance or personnel evaluation provided the 
data. The samples did not differ significantly 
with respect to means or variances on the six 
P-F scoring categories and were combined. Re- 
liability estimates for the combined sample 
(N 260) by a “rational equivalence” for- 
mula or by analysis of variance ranged from 
.10 for obstacle-dominant (O-D) to .58 for 
extrapunitive (E) with a mean r;; of .36. The 
hypothesis that the categories discriminate 
among persons was supported at the .05 level 
for all but O-D, and at the .01 level for E. 
With the obtained standard errors of meas- 
urement, one-third of the true scores at the 
mean on E, and 44 per cent at the mean on J, 
would be expected to correspond to obtained 
scores in the highest or lowest quarters of the 
distributions. 

Reliabilities of this order (with the possible 
exception of E and N-P) were considered in- 
adequate in individual evaluation. While the 
re: s might rise with a more deviant popula- 
tion, inherent limitations on the reliability of 
the P-F scoring categories were pointed out. 


Received June 2, 1951. 


References 


1. Hoyt, C. Test reliability estimated by analysis 
of variance. Psychometrika, 1941, 6, 153-160. 

2. Jackson, R. W. B., & Ferguson, G. A. Studies 
on the reliability of tests. Bulletin No. 12, De- 
partment of Educational Research, U. of To- 
ronto, 1941. 


4) 
>. 


Internal Consistency of the Rosenzweig P-F Study 153 


Johnson, P. O. Statistical methods in research. 
New York: Prentice-Hall, 1949. 

Kuder, G. F., & Richardson, M. W. The theory 
of the estimation of test reliability. Psychomet- 
rika, 1937, 2, 151-160. 

Rosenzweig, S. The experimental measurement 
of types of reaction to frustration. In H. A. 
Murray (Ed.), Explorations in personality. 
New York: Oxford University Press, 1938. Pp. 
585-589. 

Rosenzweig, S. An outline of frustration theory. 
In J. McV. Hunt (Ed.), Personality and the be- 
havior disorders. New York: Ronald Press, 
1944. Pp. 379-388. 


Rosenzweig, S. The picture-association method 
and its application in a study of reactions to 
frustration. J. Pers., 1945, 14, 3-23 
Rosenzweig, S., Fleming, E. E., & Clarke, H. J 
Revised scoring manual for the Rosenzweig 
Picture-Frustration Study. J. Psychol., 1947, 24, 
165-208 

Taylor, M. V., Jr., & Taylor, O. M. Interna 
consistency of the Group Conformity Rating of 
the Rosenzweig Picture-Frustration Study. J 
consult. Psychol., 1951, 15, 250-252 








Reliability or Variability? 
John S. Helmick 


University of California at Los Angeles 


In a recent article by Webb and De Haan 
[2] the subtest reliabilities of the Wechsler- 
Bellevue for normals and for schizophrenics 
were compared. The following statement was 
made: 


These results, showing the higher reliabilities of 
psychotics in performance tests are quite surprising 
in view of the fact that one would logically expect 
paranoid schizophrenics to obtain test reliabilities 
either similar to or lower than normals. It is dif- 
ficult to believe that the results could be an artifact 
caused either by extraordinary data or by the meth- 
od of calculation [2, p. 70]. 


A closer examination of the results presented 
raises several questions about such conclusions. 
Among these are the use of Spearman-Brown 
formula when some of the halves are not equiv- 
alent, stepping up of some correlations which 
were not originally statistically significant, and 
the final generalization based on only one dif- 
ference between r’s significant at the .01 level 
and one more at the .05 level. 

Assuming the justifiability of these, how- 
ever, there is still a major criticism which 
affects the stated conclusions. It is the fail- 
ure to consider the differences in variability 
between the two groups. Even if a test is 
measuring with the same amount of error in 
two different groups, greater variability in 
one group will still produce a higher coefficient 
of reliability for that group. 


An examination of the data presented in 
the original article indicates that such differ- 
ences in variability are primarily responsible 
for the obtained differences in reliability fig- 
ures. A more meaningful estimate of the dif- 
ference between normals and schizophrenics in 
test reliabilities may be obtained either by com- 
puting standard errors of measurement [1, 
Formula 54] or by correcting the coefficients 
of correlation for differences in range [1, For- 


mula 55]. Both of these techniques correct for 
the effect of differences in variability. 


Table 1 


Revised Comparison of Wechsler-Bellevue Subtest 
Reliabilities for Normals and Psychotics 








Reliability Coefficients 
Standard Errors ( rected to the Range 
of Measurement of the Opposite Group 





o o 

2 S 23 33 23 «633 
Subtest 3 2 2% SE 46 2% 

E % §E £8 ES BE 

° & o6 an = os i 

z oe 2o &D ZD Ao 
Vocabulary 1.37 1.68 94 91 -94 91 
Arithmetic 1.06 1.03 80 81 82 83 
Information 1.58 1.71 86 84 82 79 
Block Vesign 2.90 2.81 85 86 -76 17 
Similarities 1.80 1.95 77 .73 74 .69 
Comprehension 1.96 1.94 712 .73 53 54 
Digit Span 1.44 1.40 .50 54 44 48 
Picture Comp. 1.48 1.33 .76 81 -42 .54 


Picture Arrgm. 2.23 2.03 -55 -63 -29 Al 








Application of these techniques yields the 
results presented in Table 1. While the con- 
clusion quoted above referred specifically to 
the performance tests, it apparently was 
reached after the results had been obtained. 
Therefore, it seemed desirable to re-evaluate 
the entire set. The required SD’s for each 
total subtest were computed from the half-test 
SD’s and the actual + between halves. Since 
no SD’s were given for Object Assembly this 
subtest could not be included. These SD’s and 
the corrected reliability coefficients were used 
in computing the standard errors. In correct- 
ing the reliability coefficients for range, the 
coeficients for each group were corrected to 
the range of the other group and the two sets 
of comparisons are given, because there may be 
some question as to which should be consid- 
ered the standard. 


As can be seen, there are now no marked 
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differences between normals and psychotics. If 
the F test is applied to the standard errors 
and if the reliabilities are converted to z’s, no 
significant differences between the groups will 
be found in the table. It would seem, then, 
that the evidence for the difference between 
reliabilities for normals and for psychotics is 
something of an artifact, not necessarily of the 
method of computation or of the data, but of 
the interpretation of the coefficients without 
regard to the variability of the groups upon 
which they were obtained. 

A conclusion more nearly justified by the 
data is that the psychotics are more variable 
in their test performance than the normals. 
One difference in variance is significant at the 
02 level, and the normals show greater vari- 
ability on only one of the subtests. These dif- 
ferences, if any, are the ones which would 
seem to call for an explanation of their psy- 
chological significance. 


Nothing in this discussion is meant to ques- 
tion that portion of the original article em- 
phasizing the need for high reliabilities in or- 
der to make adequate interpretation of profiles 
or of differences between subtest scores. That 
is a requirement well worth emphasizing. This 
note is addressed solely to the point that while 
the necessity of considering the variability of 
the sample in interpreting the coefficient of 
correlation is stressed in many texts on statis- 
tics and measurement, it nevertheless seems 
necessary to call attention to it once more. 
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Corrections for Variability: A Reply 
Wilse B. Webb 


Washington University, St. Louis, Mo. 


Responses to comments fall into three gen- 
eral categories: (a) the commented upon re- 
mains silent and hopes he does not appear too 
idiotic; (4) the headless body mumbles “‘tou- 
ché”; (c) the reply begins “I’m glad you 
brought that up but... . ” Fortunately, the 
comments of Helmick [1] seem to allow the 
third category of reply. 


Helmick’s comment “Reliability or Variabil- 
ity” is primarily concerned with the “failure” 
of Webb and De Haan [2] to correct their 
correlations for differences in variability. The 
corrections can be statistically applied to the 
correlations in question. With the application 
of any statistic, however, a second (or first?) 
question of logic is necessarily involved. The 
application of a correction for restriction of 
range to the data in question appears logically 
suspect. 

When corrections for range restriction are 
applied, a theoretical population is, of course, 
created. Typically, two circumstances of meas- 
urement permit the logical construction of 
such theoretical populations: (a) the data to 
which the statistic has been applied are known 
to be a truncated portion of a total population 
and one may wish to apply one’s measurement 
to this total population in the future, or (4) a 
measurement error exists which may be elimi- 
nated in the future and one wishes to estimate 
the effect of this minimization of error. 


The present data do not meet either of these 
criteria. Scores on normal individuals do not 
logically appear to be truncated segments of 
a psychotic population. It is, of course, reason- 
able to believe that the variability of psychotics 
is related to normal variability. However, the 
if-then population created by the present ap- 
plication of a range restriction formula causes 
one’s head to whirl. “If the variability of psy- 
chotics was that of normals, then . . . .” The 
group of normal psychotics or psychotic nor- 
mals created, dependent upon the direction 


of correction, does not seem a particularly 
meaningful, logical, or worth-while group for 
research purposes. To correct “out” the vari- 
ability in the measurement of psychotics is also 
illogical. This variability does not appear to 
be an error in measurement but rather a more 
fundamental variance. The elimination of this 
variability would best represent a change in 
the population rather than in the mode of 
measurement. 


As Helmick points out “ . . . the psychotics 
are more variable in their test performance 
than the normals.”’ To change this seems to 
create a theoretical population which is essen- 
tially meaningless. 

The application of the standard error of 
measurement formula raises a further problem. 
It is known that the standard error of meas- 
urement contains an explicit assumption that 
the errors in measurement are unrelated. This 
seems a doubtful assumption in this instance— 
errors present on odd trials are likely to exist 
on even trials. This would be particularly true 
of psychotic groups. Certainly a large portion 
of the error variance existant in these measure- 
ments would be correlated and an interpreta- 
tion of the obtained standard errors of meas- 
urement would necessarily be and 
tricky. 

We would agree with Helmick that the 
variability of psychotic performance must be 
recognized and dealt with. However, it is sug- 
gested that therapy, not statistical formulae, 
would result in a more logical change in this 
variability. 


cautious 
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Books 


Buell, Bradley, and Associates. Community plan- 
ning for human services. New York: Columbia 
Univ. Press, 1952. Pp. xiv + 464. $5.50. 


Community Research Associates made a compre- 
hensive survey of the social needs and resources of 
St. Paul, Minnesota. This volume is in part a re- 
port of that survey, and in part an argument for 
the improved integration of community services. 
The social problems of the community are described 
under four main categories: dependency, ill-health, 
maladjustment, and recreational need. A chalieng- 
ing finding of the survey is the high degree of re 
lationship among these areas of distress; most fam- 
ilies with problems in one area had difficulties in 
other areas also. Six per cent of the families con- 
tributed over half of the problems. Although the 
book contains no evidence that a psychologist par- 
ticipated directly in the survey, the place of psy 
chologists in integrated community services is rec- 
ognized adequately.—L. F. S. 


Cleugh, M. F. Psychology in the service of the 
school. New York: Philosophical Library, 1951. 
Pp. vii + 183. $3.75. 

The book is planned for parents, social workers, 
and educators but is excellent for anyone planning 
to be a school psychologist. While the illustrations 
are based on English schools and conditions, they 
have their counterparts in our schools and homes. A 
real effort has been made to discuss children’s ad- 
justment problems in simple, everyday English, 
with a minimum of technical language. The book 
places the emphasis on minor adjustment problems. 
While reserving serious problems for the specialists 
and the child guidance clinics, the author recog- 
nizes that teachers and parents are constantly being 
asked to help in minor difficulties, and encourages 
their use of common sense. It includes an excellent 
guide, with illustrations, which should help the lay 
person distinguish major problems from those that 
are minor or transitory. Among the topics handled 
are: judgments and misjudgments; the meaning of 
maladjustment; the handling of aggressive and re- 
gressive reactions; and a practical guide to action. 
It also includes an index of examples and a good 
general index.—B. M. L. 





Note: The reviews are prepared by the Editor 
and Associate Editors, who may be identified by 
their initials. 
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Eissler, Ruth S., et al. (Eds.) The psychoanalytic 
study of the child. (Vol. VI.) New York: Inter- 
national Universities Press, 1951. Pp. 398. $7.50 


Volume VI of the Psychoanalytic Study of the 
Child, an annual collection of papers, consists ot 
twenty-two papers divided into five parts: Prob 
lems of Child Development; Problems of Mastur 
bation; Early Childhood; Latency; and Adoles- 
cence. 

Part 1 consists of four papers presented at the 
Anna Freud Meeting in Stockbridge, Mass., in 
April, 1950. The “opening remarks” of that sym 
posium by Ernst Kris constitute an “historically ori 
ented survey” of developments in psychoanalytic 
child psychology. Anna Freud’s paper at this meet 
ing, “Observations on Child Development” is ac 
tually a summary of her familiar work at the 
Hampstead Nurseries, jointly with Dorothy Burling 
ham, reported in War and Children (1942) and in 
Infants Without Families (1944). Dorothy Burling 
ham contributed a paper to this section on mother 
child relationship, which is essentially a very brief 
discussion of the analysis of a three and one-half 
year old boy. The concluding paper of the section 
by Marian C. Putnam, Beata Rank, and Samuel 
Kaplan, presents a much more satisfactory case dis- 
cussion of “A Case of Primal Depression in an In 
fant.” 

Part 2 is a symposium on masturbation held at 
the New York Psychoanalytic Society in 1950, based 
on the 1912 statement on the subject by Victor 
Tausk. The original paper is translated from the 
German, and Annie Reich, Ernst Kris, and Milton 
Levine are the discussants of the Tausk and other 
papers of the 1912 symposium. Reich’s and Kris’s 
discussions bring out the developments in viewpoint 
since 1912, while Levine presents observations of 
children’s sexual activities as noted by a pediatrician 
over many years. 

Part 3 is a collection of seven papers on early 
childhood, beginning with Anna Freud’s “An Ex 
periment in Group Upbringing,” written in collab 
oration with Sophie Dann. One of the most inter- 
esting in the volume, this paper describes the per 
sonality development of six children who, in the 
first year of life, were deprived of both parents by 
Nazi gas chambers, and who had known no life 
in a family setting at the time of the study, several 
years later. Anna Freud’s conclusions are among the 
most hopeful seen in child development literature 
for many years. Contrary to accepted opinion, these 
children did not seem headed for a life of very se 
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vere maladjustment usually anticipated for children 
seriously deprived emotionally in the first year of 
life—“ . they were neither deficient, delinquent, 
nor psychotic.” 

In a brief review it is impossible to do justice to 
twenty-two papers. A number of interesting theo- 
retical treatments and case discussions are presented 
in this volume. A few of the unusual chapters can 
be mentioned, however: Dorothy Burlingham has 
an interesting presentation of “Precursors of 
Psychoanalytic Ideas About Children in the Six- 
teenth and Seventeenth René Spitz at 
tempts an etiologic classification of “The Psycho- 
genic Diseases in Infancy.” Selma 
chapter on “Enlightenment and 
which the imparting of sex information to children 
is discussed, quotes Freud—“I am far from main- 
taining that this (enlightenment) is a harmful or 


oome 
Centuries.” 


Fraiberg, in a 
Confusion,” in 


unnecessary thing to do, but it is clear that the pro 
phylactic effect of this liberal measure has been 
vastly overestimated.” Some of our child study and 
parent 
heresy. Leo Spiegel, in “A Review of Contributions 
to Psychoanalytic Theory of Adolescence” not only 
reviews the literature (41 references, from Freud to 


education groups will be horrified by this 


Kinsey), but organizes the contributions in a man- 
ner that adds considerably to their meaning. 

Volume VI of The Psychoanalytic Study of the 
Child seems a more mature publication than some 
of the earlier ones, not because it is the most recent, 
but because the treatment of the same material ts 
more mature in many of the papers.—M. K. 


Jeffress, Lloyd A. (Ed.) Cerebral mechanisms in 
behavior. New York: Wiley, 1951. Pp. xiv 
311. $6.50. 


This is the Hixon Symposium held at the Cali- 
fornia Institute of Technology in the fall of 1948 
under the general chairmanship of Dr. Jeffress, who 
has subsequently done an excellent job of editing 
the proceedings. There is a contribution by von 
Neumann on the theory of automata, a cybernetic- 
ally oriented paper by McCulloch, one by Lashley 
on the problem of serial order in behavior, one by 
Kliiver on functional differences between the oc- 
cipital and temporal! lobes, one by Kohler on per- 
ception, and a discussion of brain and intelligence 
by Halstead. These are top-notch specialists, and 
their papers are well above those contributed to 
the usual symposium. The discussants include Bro- 
sin, Gerard, Liddell, Lindsley, Lorente de Né6, Niel- 
sen, and Weiss, whose discussions are lively and 
unusually stimulating. The result is a first-rate book 
which should be required reading for everyone in- 
terested in cerebral mechanisms.—W. A. H. 


Kelly, E. Lowell, and Fiske, Donald W. The pre- 
diction of performance in clinical psychology. 
Ann Arbor: Univ. of Michigan Press, 1951. Pp. 
xv + 311. $5.00. 
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This is the report of the five-year Michigan re 
search project on the evaluation of techniques for 
selecting professional personnel, aimed particularly 
at the prediction of the performance of VA trainees 
in the four-year doctoral program in clinical psy- 
Preliminary the results of the 
investigation have aroused such controversial com- 
ment that readers will welcome the opportunity this 
volume 


chology. reports of 


in detail the rationale, 
procedures, results, and implications of the project. 


provides to follow 


The immediate results are provocative in demon- 
strating individual differences among trainees, train- 
ing institutions, and training programs, in pointing 
up similarities between clinical and _ nonclinical 
students 
in clinical psychology and psychiatric residents, and 
in calling into question the predictive efhciency of 
many widely-used diagnostic instruments. The vol- 
ume as a whole, however, has significance far be- 
yond the prediction of clinical competence. The ma- 


jor contribution of the study may well prove to lie 


graduate students and between graduate 


in its development of research methodology 
priate to 


appro- 
many complex problems in the field of 
clinical psychology. The report includes, for exam- 
ple, careful analyses of the processes of defining 
criteria and making 


predictions, a comparative 


evaluation of statistical and clinical approaches to 
prediction, a step-by-step description of the develop- 
ment of a Diagnostic Prediction Test, a critical 
evaluation of the rating scale as a method of per- 


sonality 





assessment, and a straightforward handling 
of such difficult problems as the measurement of 
therapeutic competence and of research competence. 
The authors of this report recognize and state with 
candor the limitations of their data, but they still 
demonstrate convincingly the advantages of attack 
ing complex clinical problems with rigorous meth- 
ods of investigation. This report will surely be read 
not only by professional persons concerned with per- 
sonnel selection and with graduate training, but 
also by psychologists interested in advances in re 
search methodology.—A/. M. 


Kuhlen, Raymond G. The psychology of adolescent 
development. New York: Harper, 1952. Pp. xvii 
675. $5.00. 


This volume was designed frankly as a text for 
heterogeneous college classes on adolescence. It at- 
tempts to integrate representative studies in the 
existing psychological and related literature, and 
therefore reflects an over-all nomothetic organiza- 
tion traditional in this area of psychology. The au- 
thor is cognizant of the limitations of a study of the 
“average” adolescent and has included a few case 
sketches and a chapter (which, incidentally, he sug- 
gests some instructors may wish to omit) on the in- 
dividual adolescent. Whereas the title and several 
chapters emphasize development as a dynamic lon- 
gitudinal process, the book as a whole seems cross 
sectional in major emphasis. Dr. Kuhlen discusses 
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the relationship of the teen years to the prepubic 
and adult periods and emphasizes the importance of 
the total teener in a social milieu. He attempts to 
interrelate the somewhat static and atomistic data 
and charts in psychological literature with the mo- 
lar accent which practical work with adolescents 
demand. The complex process by which inner 
changes occur and the on-going activities of inte 
gration of oneself with the community and new ex- 
periences needs fuller treatment, however, in the 
reviewer's opinion, in courses in adolescent psychol- 
ogy. Chapters deal with these topics as related to 
adolescence: physical development, ability, culture, 
interests, motivations and emotions, social adjust 
ment, civic competency, ideology, education, voca- 
tion, and the home. This probably will be a widely 
adopted text since it is interestingly written, is not 
abstruse, is replete with charts, tables, and other 
vivid material about the teener.—F. McK 


Ralli, Elaine P. (Ed.) Adrenal cortex. Transactions 
of the Second Conference, November 16-17, 1950. 
New York: Josiah Macy, Jr. Foundation, 1951. 
Pp. 209. $3.00. 

This conference report consists mainly of five 
papers dealing with research on the hormones of 
the adrenal cortex and the related pituitary hor- 
mone ACTH. Although most of the discussion is 
oriented to the chemical and physiological aspects, 
it furnishes valuable background for psychologists 
interested in the related behavioral problems, such 
as the role of adrenal cortex hormones in reactions 
to “shock” and “stress."—L. F. S. 


Watson, Robert I. The clinical method in psychol- 
ogy. New York: Harper, 1951. Pp. xii + 779. 
$5.00. 


Watson’s volume is a systematic, remarkably com- 
prehensive, and clearly written survey of the entire 
field of clinical psychology. After an introductory 
chapter on the clinical method, fifteen chapters are 
devoted to diagnostic methods and seven to psycho- 
therapy. Although there is no section on clinica! re- 
search as such, relevant research studies are cited 
throughout. The book’s limitations are imposed 
mainly by the status of clinical psychology. Not all 
diagnostic techniques can be included in a single 
volume, and the author’s decision seems a wise one, 
to give comprehensive coverage to selected methods 
instead of superficial mention of many. Among pro- 
jective methods, only the TAT and the Picture- 
Frustration Study are described. The chapters on 
psychotherapy discuss the psychoanalytic, client 
centered, and eclectic approaches, but emphasize 
the common factors more than the differences. The 
section on child therapy is good. All discussions of 
psychotherapy are mainly at the descriptive or 
operating level. The brief accounts of analytic and 
nondirective theory are hardly adequate, and no at- 
tempt is made to unify the approaches in terms of 


a common learning theory. The scope of the book 
casts some doubt on its exact place in the curricu 
lum, because diagnostic and therapeutic methods 
are not often covered in the same course. Teachers 
of clinical psychology will, however, find ways to 
use this good resource. The volume is integrated 
with the author’s earlier Readings (see J. consult 
Psychol., 1949, 13, 148).—L. F. S§ 


Tests 


Bellak, Leopold. Bellak TAT Blank (Revised Form, 
1951). TAT blank, 1 per subject ($1.10 per 10 
analysis sheet, 1 per story ($1.50 per 100); man 
ual, pp. 11, (35¢) ; sample set (60¢). New York 
Psychological Corp., 1951. 


Like its earlier version (see J. consult. Psychol 
1948, 12, 126), the Bellak form consists mainly of 
blank pages, except for the detailed analysis sheet 
which is used for each story produced by the e» 
aminee. The analysis has been revised entirely, and 
now consists of ten sections: main theme; mai: 
hero; main needs of hero; conception of environ 
ment; parental, contemporary and junior figures 
significant conflicts; nature of anxieties; main de 
fenses; severity of superego; and integration of 
ego. The manual describes the administration of the 
TAT and the use of the analysis form.—L. F. § 


Bennett, George K., & Gelink, Marjorie. Short Em 
ployment Tests. Adult applicants for clerica 
work. Tests V, N, and CA; 4 forms of each 
(7) min., each test. Test booklets ($1.60 per 25, 
each form); with keys; preliminary manual, pp 
8, (35¢). Restricted distribution. New York: Psy 
chological Corp., 1951. 


These tests meet a need for a twenty-minute em 
ployment battery which yields more information 
than a single-score mental ability test. Test V has 
50 multiple-choice vocabulary items, selected to give 
maximum spread among clerical applicants. Test 
N, of 90 simple arithmetic computations, and CA 
of 60 items involving name and number checking, 
are primarily measures of speed and accuracy. Th 
manual is modestly called “preliminary” because 
no evidence of validity is yet available beyond the 
tests’ correlations with older and longer measures 
of a similar nature. Alternate-form reliabilities 
range from .81 to .93, and test intercorrelations ar 
generally of the order of .35. Percentile norms are 
supplied for three groups, one of them a sizable 
sample of 1341 applicants for office work in banks 
The distribution of the tests is restricted to em 
ployment offices.—L. F. S§ 


Cook, Walter W., Leeds, Carroll H., & Callis, Rob 
ert. Minnesota Teacher Attitude Inventory 
Teachers and teacher-training students. 1 form 
Untimed, (20-30) min. Booklet ($3.00 per 25); 
IBM answer sheet ($1.85 per 50); keys (50¢ 
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manual, pp. 15, (60¢) ; umple set (60¢). New 
York: Psychological Corp., 1951. 


The outcome of a decade ‘f investigations, the 
MTA has emerged as a succi:ssful tool for meas- 
uring teachers’ attitudes towar’ children. The ex- 
aminee responds to each of the *50 items on a five- 
point scale ranging from “Strongly agree” to 
“strongly disagree.” High scores indicate attitudes 
tending toward harmonious relationships and sym- 
pathetic understanding; low scores show hostility 
and need to dominate. Extensive research on item 
selection retained items that correlated with criteria 
of excellence of classroom relationships obtained 
from pupils, principals, and expert observers. Items 
were discarded which were highly influenced by 
years of professional experience or by the content of 
formal courses in education. Because the Inventory 
is recommended for the selection of teachers and 
for the admission of students to teacher training, 
the problem of “faking good” is relevant, but both 
experimental studies and logical analysis show that 
it is not a serious handicap. Norms are given for 
various groups of teachers and students, totaling 
3820 cases. The MTAI and the research underlying 
it provide an outstanding model of how to quan- 
tify a significant but “intangible” psychological vari- 
able.—L. F. S. 


Mooney, Ross L., & Gordon, Leonard V. Mooney 
Problem Check List. Forms Jm, Hm, and Cm, 
for grades 7-9, 10-12, and college. Untimed, (35 
50) min. Blank ($1.65 per 25); IBM answer 
sheet ($1.85 per 50); manual, pp. 15, (25¢); 
sample set (35¢). New York: Psychological Corp., 
1951. 


The Problem Check Lists (see J. consult. Psy- 
chol., 1951, 15, 170) are now provided with ma- 
chine-scoring answer sheets, which permit item 
analyses by use of the graphic item counter. The 
answer sheet is arranged to yield problem area 
counts for individuals.—L. F. S. 


Remmers, H. H., & Bauernfeind, Robert H. SRA 
Junior Inventory. Grades 4-8. 1 form. Untimed, 
(40) min. Reusable booklet (49¢) ; carbon answer 
pad ($1.90 per 25) ; self-interpreting profile leaf- 
let ($1.15 per 25); with manual, pp. 16 (25¢) ; 
specimen set (75¢). IBM available. Chicago: Sci- 
ence Research Associates, 1951. 


The Junior Inventory extends downward to the 
age range of 10 to 14 years the method used by the 
SRA Youth Inventory (see J. consult. Psychol., 1949, 
13, 453). From 223 simply worded statements, scores 
are obtained which show the scope of the problems 
experienced and reported by a child in the areas of 
health, getting along with other people, school, self, 
and home. The Manual is excellent, both in the ex- 
tent of its technical information and in the modesty 
of the suggested practical applications. Full data 
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are given on the sources of items, the item analyses 
by internal consistency, the reliabilities of scores, 
and the norms in terms of area scores and specific 
items. Applications are recommended for the use of 
Inventory results by school administrators and teach- 
ers in planning programs, by counselors in helping 
groups and individuals, and by pupils in clarifying 
their own problems.—L. F. S. 


Seashore, Harold G., &.Orbach, Charles E. Store 
Personnel Test, Form FS. 1 form. 20 (25) min. 
Test booklet ($2.75 per 25); with key, and man- 
ual, pp. 7. Restricted distribution. New York: 
Psychological Corp., 1946, 1951. 


This specialized test for selecting employees of 
retail food stores consists of two parts: “checking,” 
a test of clerical speed and accuracy, and “prob- 
lems,” a measure of verbal and numerical ability 
equivalent to a group intelligence test. All items 
are phrased in terms relevant to the retail food 
trade. Reliability is about .90, and reasonable va- 
lidity is shown by the prediction of job ratings. 
Distribution is restricted to central personnel of 
fices of food store organizations.—L. F. S. 


Stromberg, Eleroy L. Stromberg Dexterity Test. 
High school-adult. Apparatus test. 1 form. Work- 
limit test, (5-10) min. Test board ($30.00); pre- 
liminary manual, pp. 8, (35¢). New York: Psy 
chological Corp., 1951. 


The SDT is an arm-hand-movement placing task 
which also involves a simple color discrimination. 
Each trial requires the placing of 54 colored disks; 
there are two practice trials and two timed test 
trials. The board automatically returns the disks to 
their proper places for the succeeding trial. Spear- 
man-Brown corrected reliability based on the two 
timed trials is .84 to .90 for various groups. Pre- 
liminary validity studies show significant discrimi- 
nations between good and poor foundry molders. 
The test has also been used with a number of other 
manual occupations, and tentative percentile norms 
are given for small samples of persons in six trades 
and for technical school students. Although it is not 
yet shown that the SDT is more efficient than other 
long-used dexterity tests, its color sorting is novel. 
The test may well become a useful part of the in- 
dustrial psychologist’s equipment.—L. F. §. 


Strong, Edward K., Jr. Manual for the Vocational 
Interest Blank for Women. Revised Blank (Form 
W) and Scales. Stanford, Calif.: Stanford Univ. 
Press, 1951. Pp. 19. 15¢. 


Most of the content of this revised manual will 
seem quite familiar to users of the earlier editions. 
There are some new data on criterion samples, and 
some realignments of occupational groupings. Un- 
fortunately, the manual is marred by editorial care- 
lessness. For example, there are two references 
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to “Table 1” when “Table 2” is clearly intended; 
the headnote of Table 4 is either incomprehensible 
or in error. Careful reading of the manual also sug- 
gests that most of the beliefs about the validity of 
the women’s VIB are derived by analogy from 
studies of the men’s forms. Although the women’s 
blank is not without merit, it requires much further 
research.—L. F. S. 


Books Received 


Andia, Ernesto Daniel. Diagnosis de la poesia y su 


arquetipo. Buenos Aires, Argentina: Editorial E! 
Atteneo, 1951. Pp. 335. $4.00 (U. S.). 


McKean, Ellie. It’s mine! Foreword by L. K. Frank. 
New York: Vanguard Press, 1951. Pp. 40. $2.00. 


Moloney, James Clark. The battle for mental health. 
New York: Philosophical Library, 1952. Pp. x + 
105. $3.50. 


Singer, Eric. The graphologist’s alphabet. New 
York: Philosophical Library, 1951. Pp. 118. $3.75. 
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As Dr. John A. P. Millet, Chief Psychiatrist of the American Rehabilitation Com- 


mittee, said in a recent letter introducing our journal to the members of the 


American 


Psychiatric Association, “Science which is only interested in mechanistic solutions leaves 
its exponents bewildered and frustrated . . . religion can provide the conviction that 


the goals of our patients’ efforts to get well are worth the struggle . . .” 
Dr. William C. Menninger says: “PAstoraL Psycuoxoey fills a much 


felt need. 


... | am confident that it will be of much interest and helpful to psychiatrists.” 


Here are a few of ourimportant articles: 


THe Mature PEersona.iry Gordon W. Allport, Ph.D. 
RELIGIOUS APPLICATIONS OF PsyCHIATRY Karl A. Menninger, M.D. 
DEALING WITH INTERPERSONAL CONFLICT Carl R. Rogers, Ph.D. 
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RELIGION, PSYCHOTHERAPY, AND THE ACHIEVE- 

MENT OF SELFHOOD Rollo May, Ph.D. 
DIscIPLINE AND MgnTat HEALTH O. Hobart Mowrer, Ph.D. 
Tue Art or DREAM INTERPRETATION Erich Fromm, Ph.D. 

SEX AS AN EXPRESSION OF PERSONALITY AND 

SoctAL VALUES Lawrence K. Frank 
Tue Act oF SURRENDER IN THE TREATMENT 

OF THE ALCOHOLIC Harry M. ‘Tiebout, M.D. 
THe TREATMENT OF THE Homosexu AL George W. Henry, M.D. 
Tue GrowtH Factor In CHILD PERSONALITY Arnold Gesell, M.D. 
EMOTIONAL Factors in AcctpeENT PRONENESS Henry H. Brewster, M.D. 
PsycHosomatic Aspects or Love Gotthard Booth, M.D. 
Dynamics oF PERSONALITY DEVELOPMENT Franz Alexander, M.D. 
Bopy, Minp, aNp Spirit John A. P. Millet, M.D. 


You, too, will find PAsroraL PsycHotocy a practical, helpful tool in your work. 
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ical neuro-physiology. 

The section on Psychiatry covers the topics usually 
contained within the meaning of the word. In addition, 
the following special branches are dealt with fully: 

* Psychoanalysis, individual and analytical psychology 


* Sexology, criminology, alcoholism, and drug addic- 
tion s related to psychiatry 


Sociz{ and industrial psychology and psychiatry, vo- 
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Psychometrics 
Personality testing and the Rorschach test 


Heredity and statistical studics as they apply to psy- 
chiatry 


* Mental defect and epilepsy 
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